Ing Ind - Inf (Mag.)(ord. 270) - BV (483) MECHANICAL ENGINEERING - INGEGNERIA MECCANICA
052712 - DATA SCIENCE FOR MOBILITY
Ing Ind - Inf (Mag.)(ord. 270) - MI (475) ELECTRICAL ENGINEERING - INGEGNERIA ELETTRICA
052712 - DATA SCIENCE FOR MOBILITY
Data science aims at developing processes to analyze and ultimately understand phenomena through data. It stands at the intersection of several broad areas (statistics, information science, and computer science) and it employs methods from machine learning, classification, clustering, data mining, data bases, visualization, and cloud computing. This course presents the structure of the typical data science pipeline and, for each of the process, it overviews the most relevant methods and algorithms used to analyze mobility data.
The course follows a problem-driven approach in that the techniques are presented based on the type of data they can tackle may these be structured (tables), unstructured (plain text, xml files), graphs, or time-series. All the methods are discussed focusing on the fundamental theory underlying them and their peculiarity, next they are demonstrated using either Python notebooks (R might also be employed in some cases)
Topics discussed during the course include, but are not limited to, data and data representation, data preparation, regression, classification, clustering, evaluation of classification and clustering models, methods to analyze text, graphs, and time series.
Risultati di apprendimento attesi
Knowledge and understanding (Dublin Descriptor 1)
Students will learn to - Understand the structure a data science pipeline - The fundamental characteristics of the most important algorithms used in all the major steps of the pipeline - Identify architectural styles and patterns
Applying knowledge and understanding (Dublin Descriptor 2)
Given specific data mining process, students will be able to: - Analize and comment on specific architectural choices - Highlight possible criticalities - Identify existing biases - Apply the theory to assess the reliability of the results produced
Making judgements (Dublin Descriptor 3)
Given a data mining task, students will be able to: - Analyze and understand the goals, assumptions and requirements associated with that task - Select the best environment to implement each step of the data mining process - Select the best infrastructure
Communication (Dublin Descriptor 4)
Students will learn to: - Analyze the design choices that a data analytics solution entails - Present and critically discuss the results of a data science process
Lifelong learning skills (Dublin Descriptor 5)
Students will learn how to: - Develop simple projects on real-world data and how to critically analyze a proposed solution and the result it produced
The course requires some basic knowledge of programming (any language), math, and statistics.
Modalità di valutazione
Written Test (Dublin Descriptors 1 & 2)
The evaluation will be based on a written exam at the end of the course. The written exam consists of numerical problems involing the computation of score functions used in the algorithms presented during the course, execution of fundamental algorithms using small datasets, interpretation of code fragments and discussion of scenarios. Problems might also focus on the evaluation of trade-offs between different proposed solutions, definition of data mining pipelines for a given scenario, critical comparison of existing methods
Assessment of laboratorial artefacts (Dublin Descriptors 2, 3, 4, and 5)
During the semester, there will be optional projects, involving the analysis of real-world data, that the students can take and might substitute the written evaluation.
Jure Leskovec, Anand Rajaraman, Jeffrey D. Ullman, Mining of Massive Datasets http://www.mmds.org Note:
PDF available for free at the book website
Mohammed J. Zaki and Wagner Meira, Jr., Data Mining and Analysis: Fundamental Concepts and Algorithms, ISBN: 9780521766333 http://www.dataminingbook.info/ Note: