logo-polimi
Loading...
Risorse bibliografiche
Risorsa bibliografica obbligatoria
Risorsa bibliografica facoltativa
Scheda Riassuntiva
Anno Accademico 2018/2019
Scuola Scuola di Ingegneria Industriale e dell'Informazione
Insegnamento 094835 - DATA ANALYSIS AND RETRIEVAL
  • 094834 - INFORMATION RETRIEVAL AND DATA MINING
Docente Restelli Marcello
Cfu 5.00 Tipo insegnamento Modulo Di Corso Strutturato

Corso di Studi Codice Piano di Studio preventivamente approvato Da (compreso) A (escluso) Insegnamento
Ing Ind - Inf (Mag.)(ord. 270) - CO (482) COMPUTER SCIENCE AND ENGINEERING - INGEGNERIA INFORMATICA*AZZZZ094751 - INFORMATION RETRIEVAL AND DATA MINING
094835 - DATA ANALYSIS AND RETRIEVAL

Obiettivi dell'insegnamento

The course covers tools and systems adopted to handle big data, i.e., large collections of textual data. In the first part, the course focuses on the analysis of information embedded in large collections, using tools that range from decision trees, classification rules, association rules, graph-based link analysis. The second part of the course covers the efficient retrieval of information, discussing the algorithms and data structures adopted to enable answering keyword based queries, as well as indexing methods to enable fast search. 

 


Risultati di apprendimento attesi

Dublin Descriptors

Expected learning outcomes

Knowledge and understanding

 

Students will learn how to:

    • Model data mining and information retrieval problems
    • Train their models
    • Control the model complexity
    • Evaluate the performance of their models

Applying knowledge and understanding

Given specific project cases, students will be able to:

    • Model the data mining or information retrieval problem
    • Select the most appropriate data mining or information retrieval technique
    • Tune the hyperparameters of the studied techniques
    • Assess the performance of the learned model.

Making judgements

At the end of this course, students will be able to:

    • Correctly assess the performance of the proposed solutions
    • Be aware of fairness issues involved in training data mining algorithms and using information retrieval techniques, especially for what concerns the manipulation of personal data and privacy issues.

Lifelong learning skills 

    • Students will learn to model data mining and information retrieval problems
    • Students will learn to develop algorithms to solve relevant data mining and information retrieval problems.

Argomenti trattati

Data mining

  • The Data Mining process
  • Decision Trees and Decision Rules
  • Rule Induction Methods
  • Association Rules
  • Frequent Itemset Analysis 

Web information retrieval

  • Web modelling and crawling
  • Graph-based retrieval models (PageRank, HITS)

Text-based information retrieval

  • IR models (Boolean models, vector space models, probabilistic models)
  • Evaluation of IR systems
  • Text processing
  • Advanced IR models (Latent Semantic Indexing)

Indexing

  • Inverted indexing
  • Multidimensional indexing
  • Rank aggregation

 

Teaching material

Lecture slides in electronic format covering the whole course will be distributed through the beep platform.


Prerequisiti

Students are required to know the basics of statistics, linear algebra, calculus, and databases.


Modalità di valutazione

The assessment will be based on a written exam at the end of the course, where both theoretical competence and modeling skills will be tested. 

Type of assessment

Description

Dublin descriptor

Written test

Solution of numerical problems: exercises on data mining and information retrieval

Solution of modeling problems: exercises where the student needs to properly model the data mining or information retrieval problem and choosing the most appropriate solution technique

Theoretical questions with open answers.

1,2


1,2,3,5


1,2,5


Bibliografia
Risorsa bibliografica obbligatoriaChristopher M. Bishop, Pattern Recognition and Machine Learning, Editore: Springer, Anno edizione: 2006 http://incompleteideas.net/sutton/book/the-book.html
Risorsa bibliografica facoltativaTrevor Hastie , Robert Tibshirani, Jerome Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction., Editore: Springer, Anno edizione: 2013 http://statweb.stanford.edu/~tibs/ElemStatLearn/
Risorsa bibliografica facoltativaMehryar Mohri, Afshin Rostamizadeh, Ameet Talwalkar, Foundations of Machine Learning, Editore: The MIT Press, Anno edizione: 2012
Risorsa bibliografica obbligatoriaRich Sutton, Andrew Barto, Reinforcement Learning: an Introduction, Anno edizione: 1998 http://incompleteideas.net/sutton/book/the-book.html

Forme didattiche
Tipo Forma Didattica Ore di attività svolte in aula
(hh:mm)
Ore di studio autonome
(hh:mm)
Lezione
30:00
45:00
Esercitazione
20:00
30:00
Laboratorio Informatico
0:00
0:00
Laboratorio Sperimentale
0:00
0:00
Laboratorio Di Progetto
0:00
0:00
Totale 50:00 75:00

Informazioni in lingua inglese a supporto dell'internazionalizzazione
Insegnamento erogato in lingua Inglese
Disponibilità di materiale didattico/slides in lingua inglese
Disponibilità di libri di testo/bibliografia in lingua inglese
Possibilità di sostenere l'esame in lingua inglese
Disponibilità di supporto didattico in lingua inglese
schedaincarico v. 1.6.1 / 1.6.1
Area Servizi ICT
19/11/2019