20595 Data Mining

Credits: 4 advanced credits in Computer Science

Prerequisites: Students must fulfill all English requirements and take bibliographic instruction in the Library.

Required: Probability for Computer Science Students or Introduction to Statistics and Probability for Science Students, and Data Structures and Introduction to Algorithms or Data Structures

Recommended: Statistical Inference, Algorithms

The course is based on videotaped lectures by Prof. Mark Last (in Hebrew, on CD ROM) accompanied by slides, and on Data Mining: Concepts and Techniques (3rd ed.), by J. Han, M. Kamber and J. Pei (Morgan Kaufmann, 2012).

Data Mining (DM) refers to extracting or “mining” knowledge from large amounts of data. The discovered knowledge can be applied to decision-making, process control, information management, and query processing.

The course presents DM from a database perspective, where emphasis is placed on basic data mining concepts and techniques for uncovering interesting data patterns hidden in large data sets. It covers concepts and techniques that underlie classification, prediction, association, and clustering. Some popular DM methods and algorithms and their applications are presented and analyzed.

Topics: DM main steps, information theory, data preprocessing, classification and prediction, decision trees, info-fuzzy networks, Bayesian classification, instance-based learning, association rules, cluster analysis, feature selection, advanced topics in DM.