Ing Ind - Inf (Mag.)(ord. 270) - CO (482) COMPUTER SCIENCE AND ENGINEERING - INGEGNERIA INFORMATICA
094835 - DATA ANALYSIS AND RETRIEVAL
094751 - INFORMATION RETRIEVAL AND DATA MINING
The course covers tools and systems adopted to handle big data, i.e., large collections of textual data. In the first part, the course focuses on the analysis of information embedded in large collections, using tools that range from decision trees, classification rules, association rules, graph-based link analysis. The second part of the course covers the efficient retrieval of information, discussing the algorithms and data structures adopted to enable answering keyword based queries, as well as indexing methods to enable fast search.
Risultati di apprendimento attesi
Expected learning outcomes
Knowledge and understanding
Students will learn how to:
Model data mining and information retrieval problems
Train their models
Control the model complexity
Evaluate the performance of their models
Applying knowledge and understanding
Given specific project cases, students will be able to:
Model the data mining or information retrieval problem
Select the most appropriate data mining or information retrieval technique
Tune the hyperparameters of the studied techniques
Assess the performance of the learned model.
At the end of this course, students will be able to:
Correctly assess the performance of the proposed solutions
Be aware of fairness issues involved in training data mining algorithms and using information retrieval techniques, especially for what concerns the manipulation of personal data and privacy issues.
Lifelong learning skills
Students will learn to model data mining and information retrieval problems
Students will learn to develop algorithms to solve relevant data mining and information retrieval problems.
The Data Mining process
Decision Trees and Decision Rules
Rule Induction Methods
Frequent Itemset Analysis
Web information retrieval
Web modelling and crawling
Graph-based retrieval models (PageRank, HITS)
Text-based information retrieval
IR models (Boolean models, vector space models, probabilistic models)
Evaluation of IR systems
Advanced IR models (Latent Semantic Indexing)
Lecture slides in electronic format covering the whole course will be distributed through the beep platform.
Students are required to know the basics of statistics, linear algebra, calculus, and databases.
Modalità di valutazione
The assessment will be based on a written exam at the end of the course, where both theoretical competence and modeling skills will be tested.
Type of assessment
Solution of numerical problems: exercises on data mining and information retrieval
Solution of modeling problems: exercises where the student needs to properly model the data mining or information retrieval problem and choosing the most appropriate solution technique