logo-polimi
Loading...
Risorse bibliografiche
Risorsa bibliografica obbligatoria
Risorsa bibliografica facoltativa
Scheda Riassuntiva
Anno Accademico 2018/2019
Scuola Scuola di Ingegneria Industriale e dell'Informazione
Insegnamento 052712 - DATA SCIENCE FOR MOBILITY
Docente Lanzi Pierluca
Cfu 5.00 Tipo insegnamento Monodisciplinare

Corso di Studi Codice Piano di Studio preventivamente approvato Da (compreso) A (escluso) Insegnamento
Ing Ind - Inf (Mag.)(ord. 270) - BV (483) MECHANICAL ENGINEERING - INGEGNERIA MECCANICA*AZZZZ052712 - DATA SCIENCE FOR MOBILITY
Ing Ind - Inf (Mag.)(ord. 270) - MI (475) ELECTRICAL ENGINEERING - INGEGNERIA ELETTRICA*AZZZZ052712 - DATA SCIENCE FOR MOBILITY

Obiettivi dell'insegnamento

Data science aims at developing processes to analyze and ultimately understand phenomena through data. It stands at the intersection of several broad areas (statistics, information science, and computer science) and it employs methods from machine learning, classification, clustering, data mining, data bases, visualization, and cloud computing. This course presents the structure of the typical data science pipeline and, for each of the process, it overviews the most relevant methods and algorithms used to analyze mobility data.

The course follows a problem-driven approach in that the techniques are presented based on the type of data they can tackle may these be structured (tables), unstructured (plain text, xml files), graphs, or time-series. All the methods are discussed focusing on the fundamental theory underlying them and their peculiarity, next they are demonstrated using either Python notebooks (R might also be employed in some cases)

Topics discussed during the course include, but are not limited to, data and data representation, data preparation, regression, classification, clustering, evaluation of classification and clustering models, methods to analyze text, graphs, and time series. 


Risultati di apprendimento attesi

Knowledge and understanding (Dublin Descriptor 1)

Students will learn to
- Understand the structure a data science pipeline
- The fundamental characteristics of the most important algorithms used in all the major steps of the pipeline
- Identify architectural styles and patterns

Applying knowledge and understanding (Dublin Descriptor 2)

Given specific data mining process, students will be able to:
- Analize and comment on specific architectural choices
- Highlight possible criticalities
- Identify existing biases
- Apply the theory to assess the reliability of the results produced

Making judgements (Dublin Descriptor 3)

Given a data mining task, students will be able to:
- Analyze and understand the goals, assumptions and requirements associated with that task
- Select the best environment to implement each step of the data mining process
- Select the best infrastructure

Communication (Dublin Descriptor 4)

Students will learn to:
- Analyze the design choices that a data analytics solution entails
- Present and critically discuss the results of a data science process

Lifelong learning skills (Dublin Descriptor 5)

Students will learn how to:
- Develop simple projects on real-world data and how to critically analyze a proposed solution and the result it produced


Argomenti trattati
  • Introduction to Data Science
  • The data science pipeline
  • Understanding data and data representing
  • Regression 
  • Classification (decision trees, rules, Bayesian networks, ensemble methods, deep neural networks)
  • Clustering
  • Text Mining
  • Graph Mining
  • Time Series
  • Data Exploration and Preprocessing

Prerequisiti

The course requires some basic knowledge of programming (any language), math, and statistics.


Modalità di valutazione

Written Test (Dublin Descriptors 1 & 2)

The evaluation will be based on a written exam at the end of the course. The written exam consists of numerical problems involing the computation of score functions used in the algorithms presented during the course, execution of fundamental algorithms using small datasets, interpretation of code fragments and discussion of scenarios. Problems might also focus on the evaluation of trade-offs between different proposed solutions, definition of data mining pipelines for a given scenario, critical comparison of existing methods

Assessment of laboratorial artefacts (Dublin Descriptors 2, 3, 4, and 5)

During the semester, there will be optional projects, involving the analysis of real-world data, that the students can take and might substitute the written evaluation.


Bibliografia
Risorsa bibliografica obbligatoriaJure Leskovec, Anand Rajaraman, Jeffrey D. Ullman, Mining of Massive Datasets http://www.mmds.org
Note:

PDF available for free at the book website

Risorsa bibliografica obbligatoriaMohammed J. Zaki and Wagner Meira, Jr., Data Mining and Analysis: Fundamental Concepts and Algorithms, ISBN: 9780521766333 http://www.dataminingbook.info/
Note:

PDF available for free at the book website.

Risorsa bibliografica facoltativaIan H. Witten , Eibe Frank, and Mark A. Hall, Data Mining: Practical Machine Learning Tools and Technique, ISBN: 978-0123748560 http://www.pearsonhighered.com/educator/academic/product/0,1144,0321321367,00.html

Forme didattiche
Tipo Forma Didattica Ore di attività svolte in aula
(hh:mm)
Ore di studio autonome
(hh:mm)
Lezione
30:00
45:00
Esercitazione
20:00
30:00
Laboratorio Informatico
0:00
0:00
Laboratorio Sperimentale
0:00
0:00
Laboratorio Di Progetto
0:00
0:00
Totale 50:00 75:00

Informazioni in lingua inglese a supporto dell'internazionalizzazione
Insegnamento erogato in lingua Inglese
Disponibilità di materiale didattico/slides in lingua inglese
Disponibilità di libri di testo/bibliografia in lingua inglese
Possibilità di sostenere l'esame in lingua inglese
Disponibilità di supporto didattico in lingua inglese
schedaincarico v. 1.6.5 / 1.6.5
Area Servizi ICT
11/08/2020