Ing Ind - Inf (Mag.)(ord. 270) - CO (482) COMPUTER SCIENCE AND ENGINEERING - INGEGNERIA INFORMATICA

*

A

ZZZZ

097683 - MACHINE LEARNING

094835 - DATA ANALYSIS AND RETRIEVAL

099355 - MACHINE LEARNING

Ing Ind - Inf (Mag.)(ord. 270) - MI (486) ENGINEERING PHYSICS - INGEGNERIA FISICA

*

A

ZZZZ

088959 - PATTERN ANALYSIS AND MACHINE INTELLIGENCE

Obiettivi dell'insegnamento

The objective of the Machine Learning course is to give an in depth presentation of the techniques most used for pattern recognition, knowledge discovery, and data analysis/modeling. These techniques are presented both from a theoretical (i.e., statistics and information theory) perspective and a practical one (i.e., coding examples) through the descriptions of algorithms and their implementations in R.

Risultati di apprendimento attesi

Dublin Descriptors

Expected learning outcomes

Knowledge and understanding

Students will learn:

· What are the main paradigms in machine learning and in which particular problem each of them could/should be applied

· What are the fundamental techniques for regression, their strengths and weaknesses, their computational complexity, and the main reason to favor one technique with respect to the other

· What are the fundamental techniques for classification, their strengths and weaknesses, their computational complexity, and the main reason to favor one technique with respect to the other

· What are the fundamental clustering algorithms, their strengths and weaknesses, their computational complexity, and the main reason to favor one technique with respect to the other

Applying knowledge and understanding

Given a specific data analysis problem, student will be able to:

· Identify which paradigm to apply to model the problem

· Identify which technique to start from with the analysis, apply it to model the data and evaluate its outcome

· Implement fundamental algorithms for regression classification and clustering autonomously

Making judgements

Given a complex data analysis problem, students will be able to:

· Identify the occurrence of overfitting by the model under analysis

· Identify the most relevant features for the problem under analysis to improve the model via feature selection

· Iteratively refine the selected model in order to balance performance, computational complexity and overfitting

· Compare and select different models for the problem under analysis

Communication

Student will learn to:

· Discuss in written form the pros and cons of different machine learning techniques for a specific problem

Lifelong learning skills

Student will learn to:

· Face a real life data analysis problem with a sound and complete methodological approach

· Understand complex machine learning techniques beyond the fundamental ones presented during lectures

· Develop new machine learning pipelines adapting to the specific problem under analysis

Argomenti trattati

The course is composed by a set of lectures on specific machine learning techniques (e.g., linear regression, linear discriminant analysis, support vector machines, clustering, etc.) preceded by the introduction of the Statistical Learning framework which acts as common reference framework for the entire course. Supervised and unsupervised learning paradigms are described and discussed in the framework of classification and clustering problems.

The course outline is:

Machine Learning and Pattern Classification: the general concepts of Machine Learning and Patter Recognition are introduced within the framework of statistical decision theory;

Linear Regression Techniques: linear methods for regression will be will be presented and discussed introducing different techniques (e.g., Linear Regression, Ridge Regression, K-Nearest Neighbors Regression, Non Linear Regression, etc.) and the most common methodologies for model model validation and selection (e.g., AIC, BIC, cross-validation, stepwise feature selection, Lasso, etc.).

Linear Classification Techniques: generative and discriminative techniques for classification will be described and discussed (e.g., Logistic Regression, Linear and Quadratic Discriminant Analysis, Logistic Regression, K-Nearest Neighbors, Perceptron rule and Optimal Separating Hyperplanes, a.k.a., Support Vector Machines, etc.).

Unsupervised Learning Techniques: the most common approaches to unsupervised learning are described mostly focusing on clustering techniques such as hierarchical clustering, k-means, k-medoids, Mixture of Gaussians, DBSCAN, Jarvis-Patrick, etc.

A detailed schedule of the course will be provided on the course website (http://chrome.ws.dei.polimi.it/index.php/Machine_Learning) with reference to the chapters of the books that will be used as reference for the course and the additional material provided by the teachers.

Prerequisiti

Students are expected to know the principles and methods of statistics and probability as well as the basics of programming.

Modalità di valutazione

The course evaluation is composed by theoretical and practical parts:

During the semester (optional) homework activity based on the analysis of some simple dataset with R language will be proposed to be done at home and delivered before the starting of the first exam session

During the exam session students will have to pass a (mandatory) written examination covering the whole program with both theoretical questions and practical exercises

The home exercises are not mandatory, but, if and only if they contribute to increase the written exam, they count for 30% of the grade.

Type of assessment

Description

Dublin descriptor

Written test

Solution of numerical problems

· Computation of linear models for regression and classification on small datasets

· Application of clustering algorithm to simple datasets

Answer to theoretical questions

· Derivation of model properties and theoretical boundaries on expected prediction accuracy

· Comparison of models in terms of complexity, overfitting issues, and applicability in different cases of data analysis

1, 2

1, 2, 3, 4, 5

Assessment of practical homework

Execution of practical homework exercises

· Application of regression, classification and clustering algorithms to small dataset

· Implementation of simple algorithms for machine learning in R

· Application of model selection and validation techniques on simulated and real datasets

2, 3, 4, 5

Bibliografia

Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani, An Introduction to Statistical Learning with Applications in R, Editore: Springer http://www-bcf.usc.edu/~gareth/ISL/Trevor Hastie, Robert Tibshirani, and Jerome Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2nd edition), Editore: Springer-Verlag, Anno edizione: 2008 http://stat.stanford.edu/~tibs/ElemStatLearn/index.htmlBurges, Christopher J. C., A tutorial on support vector machines for pattern recognition, Anno edizione: 1998 http://www.svms.org/tutorials/Burges1998.pdf

Forme didattiche

Tipo Forma Didattica

Ore di attività svolte in aula

(hh:mm)

Ore di studio autonome

(hh:mm)

Lezione

30:00

45:00

Esercitazione

20:00

30:00

Laboratorio Informatico

0:00

0:00

Laboratorio Sperimentale

0:00

0:00

Laboratorio Di Progetto

0:00

0:00

Totale

50:00

75:00

Informazioni in lingua inglese a supporto dell'internazionalizzazione

Insegnamento erogato in lingua
Inglese

Disponibilità di materiale didattico/slides in lingua inglese

Disponibilità di libri di testo/bibliografia in lingua inglese

Possibilità di sostenere l'esame in lingua inglese

Disponibilità di supporto didattico in lingua inglese