logo-polimi
Loading...
Risorse bibliografiche
Risorsa bibliografica obbligatoria
Risorsa bibliografica facoltativa
Scheda Riassuntiva
Anno Accademico 2018/2019
Scuola Scuola di Ingegneria Industriale e dell'Informazione
Insegnamento 052487 - MACHINE LEARNING
Docente Vercellis Carlo
Cfu 5.00 Tipo insegnamento Monodisciplinare

Corso di Studi Codice Piano di Studio preventivamente approvato Da (compreso) A (escluso) Insegnamento
Ing Ind - Inf (Mag.)(ord. 270) - BV (479) MANAGEMENT ENGINEERING - INGEGNERIA GESTIONALE*AZZZZ052487 - MACHINE LEARNING

Obiettivi dell'insegnamento

This course gives an overview of techniques and algorithms in machine learning and pattern recognition. It provides students with the basic ideas and intuition behind modern machine learning methods as well as a more detailed coverage of most techniques.

The course fits into the overall program curriculum pursuing some of the defined general learning goals. In particular, the course contributes to the development of the following capabilities:

  • Understand context, functions, processes in a business and industrial environment and the impact of those factors on business performance
  • Identify trends, technologies and key methodologies in a specific domain (specialization streams)
  • Design solutions applying a scientific and engineering approach (Analysis, Learning, Reasoning, and Modeling capability deriving from a solid and rigorous multidisciplinary background) to face problems and opportunities in a business and industrial environment

Risultati di apprendimento attesi

By the end of the module, students should:

  • Have a good understanding of the fundamental issues and challenges of machine learning: data, model selection, model complexity.
  • Develop an appreciation for what is involved in learning from data.
  • Understand a wide variety of machine learning algorithms.
  • Understand how to apply a variety of learning algorithms to data.
  • Understand how to perform evaluation of learning algorithms and model selection.
  • Have an understanding of the strengths and weaknesses of many popular machine learning approaches.
  • Appreciate the underlying mathematical relationships within and across machine learning algorithms and the paradigms of supervised and unsupervised learning.
  • Apply the algorithms to a real-world problem, optimize the models learned and report on the expected accuracy that can be achieved by applying the models.
  • Be able to design and implement various machine learning algorithms in a range of real-world applications.
  • Be able to write code in Python programming language to use machine learning algorithms.

Argomenti trattati

Introduction to Machine Learning

Motivations of machine learning. Machine learning, artificial intelligence and big data. Applications of machine learning. Representation of input data. Machine learning process.

Exploratory data analysis

Data validation and cleansing, outlier and missing values detection. Data transformation. Data reduction. Sampling. Feature selection. Features extraction by filtering. Principal component analysis. Data discretization. Univariate analysis: graphical analysis, measures of central tendency, dispersion, relative location, heterogeneity, analysis of the empirical density. Bivariate analysis: graphical analysis, measures of correlation, contingency tables. Multivariate analysis: graphical analysis, measures of correlation.

Supervised learning: classification and regression

Taxonomy of supervised methods. Evaluation of classification models: holdout, cross-validation, confusion matrix and derived metrics, ROC curve, cumulative gain and lift. Treatment of categorical attributes. Nearest neighbor. Classification and regression trees: splitting, stopping and pruning. Bayesian methods: naive methods, Bayesian networks. Logistic regression. Neural networks: Rosenblatt perceptron, multi-level feed-forward networks. Support vector machines: structural risk minimization, maximal margin hyperplane for linear separation, nonlinear separation. Simple and multiple linear regression. Assumptions on residuals. Least square regression: normality and independence of residuals, significance of coefficients, analysis of variance, coefficients of determination and linear correlation, multicollinearity, confidence and prediction limits. Selection of predictive variables. Ridge regression. Generalized linear regression.

Association rules

Motivation and evaluation of association rules. Single-dimension association rules. Apriori algorithm. Generation of frequent itemsets, generation of strong rules. General association rules.

Clustering

Taxonomy of clustering methods. Affinity measures. Partition methods: K-means, K-medoids. Hierarchical methods: agglomerative methods, divisive methods. Evaluation of clustering models.

Applications and use cases

Introduction to Python programming language and its main libraries for machine learning (Scikit-learn, Keras). Applications in relational marketing using Python: lifetime value analysis, acquisition, retention, cross-selling, market basket analysis. Web mining. Social market analysis. Speech recognition. Text mining. Fraud and anomaly detection. Bioinformatics.


Prerequisiti

Machine Learning is a discipline at the interface between mathematics and computer science. Hence, a good background in probability, linear algebra and calculus is required, as well as a programming experience.


Modalità di valutazione

The final mark is determined by three components: 30% is due to individual assignments to be delivered at due dates during the course; then there is a final written exam at each session, in which 30% is based on open answers and 40% on closed form answers. Some more theoretical questions aim at assessing knowledge acquisition with respect to tasks, methods and algorithms. Some more applied questions aim at assessing the ability to apply methods and algorithms, to properly understand the outputs and to derive the implications for the application domain.

Notice that only students officially registered for a given session will be allowed to take the examination in that session. Late registrations will be rejected.


Bibliografia
Risorsa bibliografica obbligatoriaC. Vercellis, Business intelligence: data mining and optimization for decision making, Editore: Wiley, Anno edizione: 2009
Risorsa bibliografica facoltativaT. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning, Editore: Springer, Anno edizione: 2011
Risorsa bibliografica facoltativaE. Alpaydin, Introduction to Machine Learning, Editore: MIT press, Anno edizione: 2014
Risorsa bibliografica facoltativaA. Geron, Hands-On Machine Learning With Scikit-Learn and Tensorflow: Concepts, Tools, and Techniques to Build Intelligent Systems, Editore: O'Reilly, Anno edizione: 2017

Software utilizzato
Nessun software richiesto

Forme didattiche
Tipo Forma Didattica Ore di attività svolte in aula
(hh:mm)
Ore di studio autonome
(hh:mm)
Lezione
30:00
45:00
Esercitazione
0:00
0:00
Laboratorio Informatico
20:00
30:00
Laboratorio Sperimentale
0:00
0:00
Laboratorio Di Progetto
0:00
0:00
Totale 50:00 75:00

Informazioni in lingua inglese a supporto dell'internazionalizzazione
Insegnamento erogato in lingua Inglese
Disponibilità di materiale didattico/slides in lingua inglese
Disponibilità di libri di testo/bibliografia in lingua inglese
Possibilità di sostenere l'esame in lingua inglese
Disponibilità di supporto didattico in lingua inglese
schedaincarico v. 1.8.3 / 1.8.3
Area Servizi ICT
27/09/2023