logo-polimi
Loading...
Risorse bibliografiche
Risorsa bibliografica obbligatoria
Risorsa bibliografica facoltativa
Scheda Riassuntiva
Anno Accademico 2019/2020
Scuola Scuola di Ingegneria Industriale e dell'Informazione
Insegnamento 054268 - DATA SCIENCE AND SECURITY FOR MOBILITY
Docente Carman Mark James
Cfu 10.00 Tipo insegnamento Monodisciplinare

Corso di Studi Codice Piano di Studio preventivamente approvato Da (compreso) A (escluso) Insegnamento
Ing Ind - Inf (Mag.)(ord. 270) - BV (483) MECHANICAL ENGINEERING - INGEGNERIA MECCANICA*AZZZZ052712 - DATA SCIENCE FOR MOBILITY
Ing Ind - Inf (Mag.)(ord. 270) - BV (499) MOBILITY ENGINEERING*AZZZZ054268 - DATA SCIENCE AND SECURITY FOR MOBILITY
Ing Ind - Inf (Mag.)(ord. 270) - MI (475) ELECTRICAL ENGINEERING - INGEGNERIA ELETTRICA*AZZZZ052712 - DATA SCIENCE FOR MOBILITY

Obiettivi dell'insegnamento

Data science aims at developing processes to analyze and ultimately understand phenomena through data. It stands at the intersection of several broad areas (statistics, information science, and computer science) and it employs methods from machine learning, classification, clustering, data mining, data bases, visualization, and cloud computing.

This course presents the structure of the typical data science pipeline and, for each step of the process, reviews the most relevant methods and algorithms used to analyze mobility data. The course also tackles cyber-security risk in mobility applications.

The course follows a problem-driven approach in that the techniques are presented based on the type of data they can tackle may these be structured (tables), unstructured (plain text, xml files), graphs, or time-series. All the methods are discussed focusing on the fundamental theory underlying them and their peculiarity, next they are demonstrated using either Python notebooks, KNIME workflows, and R.

Topics discussed during the course include, but are not limited to, data and data representation, data preparation, regression, classification, clustering, evaluation of classification and clustering models, methods to analyze text, graphs, time series, and cyber-security risk as regards to the mobility landscape.

The course comprises frontal lectures (60 hours) and practical data science hands-on lab (40 hours). During the laboratory hours, students will learn how to apply the techniques discussed during the lectures and will work on the mandatory course projects, presented at the start of the course, which must be completed by the last lecture for the course.

The final grade will be based on an oral/written exam and the mandatory data science project.


Risultati di apprendimento attesi

Knowledge and understanding (Dublin Descriptor 1)
Students will learn to
- Understand the structure a data science pipeline
- The fundamental characteristics of the most important algorithms used in all the major steps of the pipeline
- Identify architectural styles and patterns
Applying knowledge and understanding (Dublin Descriptor 2)
Given specific data mining process, students will be able to:
- Analyze and comment on specific architectural choices
- Highlight possible criticalities including security vulnerabilities
- Identify existing biases
- Apply the theory to assess the reliability of the results produced
Making judgements (Dublin Descriptor 3)
Given a data mining task, students will be able to:
- Analyze and understand the goals, assumptions and requirements associated with that task
- Select the best environment to implement each step of the data mining process
- Select the best infrastructure
Communication (Dublin Descriptor 4)
Students will learn to:
- Analyze the design choices that a data analytics solution entails
- Present and critically discuss the results of a data science process
Lifelong learning skills (Dublin Descriptor 5)
Students will learn how to:
- Develop simple projects on real-world data and how to critically analyze a proposed solution and the result it produced

 


Argomenti trattati
  • Introduction to Data Science
  • The Data Science Pipeline
  • Understanding Data and its Representation
  • Regression 
  • Classification
  • Clustering
  • Text Mining
  • Graph Mining
  • Time Series
  • Data Exploration and Preprocessing
  • Cyber-security Risks and Applications to Mobility

 


Prerequisiti
 

Modalità di valutazione

Written Test (Dublin Descriptors 1 & 2)

The evaluation will be based on an oral/written exam at the end of the course. The written exam consists of numerical problems involving the computation of score functions used in the algorithms presented during the course, execution of fundamental algorithms using small datasets, interpretation of code fragments and discussion of scenarios. Problems might also focus on the evaluation of trade-offs between different proposed solutions, definition of data mining pipelines for a given scenario, critical comparison of existing methods

 

Assessment of laboratorial artefacts (Dublin Descriptors 2, 3, 4, and 5)

At the beginning of the course, one or more real-world projects will be presented by invited companies. Students are required to work on one of the projects and present a final report by the end of the course describing the data science pipeline they implemented and the results they obtained.

 


Bibliografia
Risorsa bibliografica obbligatoriaJure Leskovec, Anand Rajaraman, Jeffrey D. Ullman, Mining of Massive Datasets http://www.mmds.org
Note:

PDF available for free at the book website

Risorsa bibliografica obbligatoriaMohammed J. Zaki and Wagner Meira, Jr., Data Mining and Analysis: Fundamental Concepts and Algorithms http://www.dataminingbook.info/
Note:

PDF available for free at the book website.

Risorsa bibliografica facoltativaIan H. Witten , Eibe Frank, and Mark A. Hall, Data Mining: Practical Machine Learning Tools and Technique, ISBN: 978-0123748560 http://www.pearsonhighered.com/educator/academic/product/0,1144,0321321367,00.html
Risorsa bibliografica facoltativaClarence Chio and David Freeman, Machine Learning and Security: Protecting Systems with Data and Algorithms, Editore: O'Reilly Media, ISBN: 978-1491979907 http://shop.oreilly.com/product/0636920065555.do

Software utilizzato
Nessun software richiesto

Forme didattiche
Tipo Forma Didattica Ore di attività svolte in aula
(hh:mm)
Ore di studio autonome
(hh:mm)
Lezione
65:00
97:30
Esercitazione
35:00
52:30
Laboratorio Informatico
0:00
0:00
Laboratorio Sperimentale
0:00
0:00
Laboratorio Di Progetto
0:00
0:00
Totale 100:00 150:00

Informazioni in lingua inglese a supporto dell'internazionalizzazione
Insegnamento erogato in lingua Inglese
Disponibilità di materiale didattico/slides in lingua inglese
Disponibilità di libri di testo/bibliografia in lingua inglese
Possibilità di sostenere l'esame in lingua inglese
Disponibilità di supporto didattico in lingua inglese
schedaincarico v. 1.6.9 / 1.6.9
Area Servizi ICT
21/01/2022