logo-polimi
Loading...
Risorse bibliografiche
Risorsa bibliografica obbligatoria
Risorsa bibliografica facoltativa
Scheda Riassuntiva
Anno Accademico 2018/2019
Scuola Scuola di Ingegneria Industriale e dell'Informazione
Insegnamento 089167 - DATA MINING AND TEXT MINING (UIC 583)
Docente Lanzi Pierluca
Cfu 5.00 Tipo insegnamento Monodisciplinare

Corso di Studi Codice Piano di Studio preventivamente approvato Da (compreso) A (escluso) Insegnamento
Ing - Civ (Mag.)(ord. 270) - MI (495) GEOINFORMATICS ENGINEERING - INGEGNERIA GEOINFORMATICA*AZZZZ089167 - DATA MINING AND TEXT MINING (UIC 583)
Ing Ind - Inf (Mag.)(ord. 270) - MI (474) TELECOMMUNICATION ENGINEERING - INGEGNERIA DELLE TELECOMUNICAZIONI*AZZZZ089167 - DATA MINING AND TEXT MINING (UIC 583)
Ing Ind - Inf (Mag.)(ord. 270) - MI (481) COMPUTER SCIENCE AND ENGINEERING - INGEGNERIA INFORMATICA*AZZZZ089167 - DATA MINING AND TEXT MINING (UIC 583)
Ing Ind - Inf (Mag.)(ord. 270) - MI (487) MATHEMATICAL ENGINEERING - INGEGNERIA MATEMATICA*AZZZZ089167 - DATA MINING AND TEXT MINING (UIC 583)

Obiettivi dell'insegnamento

This course provides an introduction to Data Mining and an overview of all the most important algorithms used in this field. The course consists two sets of lectures. The first set, covering 24 hours, introduces the field of Data Mining and overviews all the main algorithms available in most commercial tools. The second set of lectures, covering 16 hours, focuses on specific application areas such as Text Mining, Bioinformatics, social networks, etc. An optional project will be available near the end of the course (around the last week of May-first week of June).

 


Risultati di apprendimento attesi

Dublin Descriptors

Expected learning outcomes

Knowledge and understanding

Students will learn to:

·      Structure a data mining pipeline

·      The fundamental characteristics of the most important algorithms used in all the major steps of the pipeline

·      Identify architectural styles and patterns

Applying knowledge and understanding

Given specific data mining process, students will be able to:

·      Analize and comment on specific architectural choices

·      Highlight possible criticalities

·      Identify existing biases

·      Apply the theory to assess the reliability of the results produced

Making judgements

Given a data mining task, students will be able to:

·      Analyze and understand the goals, assumptions and requirements associated with that task

·      Select the best environment to implement each step of the data mining process

·      Select the best infrastructure

Communication

Students will learn to:

·      Analyze the design choices that a data analytics solution entails

·      Present and critically discuss the results of a data mining process

Lifelong learning skills

Students will learn how to:

·      Develop a project on real-world data and how to critically analyze a proposed solution and the result it produced


Argomenti trattati
  • Introduction to Data Mining
  • Understanding data and data representing
  • Regression
  • Classification (decision trees, rules, Bayesian networks, etc.)
  • Evaluation of classification algorithms
  • Clustering
  • Association rule mining
  • Ensemble Methods (Bagging, Boosting, Random Forest, Gradient Boosting)
  • Text Mining
  • Data Exploration and Preprocessing
  • Graph Mining and Social Networks

Prerequisiti

Students should have a basic knowledge of statistics and programming.


Modalità di valutazione

The exam consists of a written test at the end of the course. During the course, we will announce an optional project, involving the analysis of real-world data, that the students can take if they wish.

Type of assessment

Description

Dublin descriptor

Written test

Solution of numerical problems

·       Computation of score functions used in data mining algorithms

·       Execution of fundamental algorithms using small datasets

·       Interpretation of code fragments to understand what they compute

 

Exercises focusing on design aspects

·       Evaluation of trade-offs between different proposed solutions

·       Definition of data mining pipelines for a given scenario

·       Critical comparison of existing methods

1,2

 

 

 

 

1, 2, 3, 4, 5

Assessment of laboratorial artefacts

  • Assessment of the design of the data analytics pipeline and the experimental work developed by students either individually or in groups

2, 3, 4, 5


Bibliografia
Risorsa bibliografica obbligatoriaJure Leskovec, Anand Rajaraman, Jeffrey D. Ullman, Mining of Massive Datasets http://www.dataminingbook.info/
Note:

PDF available for free at the book website

Risorsa bibliografica obbligatoriaMohammed J. Zaki and Wagner Meira, Jr., Data Mining and Analysis: Fundamental Concepts and Algorithms, Anno edizione: 2014, ISBN: 9780521766333 http://www.dataminingbook.info/
Note:

PDF available for free at the book website.

Risorsa bibliografica facoltativaIan H. Witten , Eibe Frank, and Mark A. Hall, Data Mining: Practical Machine Learning Tools and Technique, Editore: Morgan Kaufmann, Anno edizione: 2011, ISBN: 978-0123748560 http://www.pearsonhighered.com/educator/academic/product/0,1144,0321321367,00.html

Software utilizzato
Nessun software richiesto

Forme didattiche
Tipo Forma Didattica Ore di attività svolte in aula
(hh:mm)
Ore di studio autonome
(hh:mm)
Lezione
32:30
48:45
Esercitazione
17:30
26:15
Laboratorio Informatico
0:00
0:00
Laboratorio Sperimentale
0:00
0:00
Laboratorio Di Progetto
0:00
0:00
Totale 50:00 75:00

Informazioni in lingua inglese a supporto dell'internazionalizzazione
Insegnamento erogato in lingua Inglese
Disponibilità di materiale didattico/slides in lingua inglese
Disponibilità di libri di testo/bibliografia in lingua inglese
Possibilità di sostenere l'esame in lingua inglese
Disponibilità di supporto didattico in lingua inglese
schedaincarico v. 1.8.3 / 1.8.3
Area Servizi ICT
05/12/2023