Ing Ind - Inf (Mag.)(ord. 270) - MI (471) BIOMEDICAL ENGINEERING - INGEGNERIA BIOMEDICA
054063 - MODEL IDENTIFICATION AND DATA ANALYSIS 2
054102 - MACHINE LEARNING
054062 - MODEL IDENTIFICATION AND MACHINE LEARNING [I.C.]
This course gives an overview of techniques and algorithms in machine learning and pattern recognition. It provides students with the basic ideas and intuition behind modern machine learning methods as well as a more detailed coverage of most techniques.
The course fits into the overall program curriculum pursuing some of the defined general learning goals. In particular, the course contributes to the development of the following capabilities:
Understand context, functions, processes in a business and industrial environment and the impact of those factors on business performance
Identify trends, technologies and key methodologies in a specific domain (specialization streams)
Design solutions applying a scientific and engineering approach (Analysis, Learning, Reasoning, and Modeling capability deriving from a solid and rigorous multidisciplinary background) to face problems and opportunities in a business and industrial environment
Risultati di apprendimento attesi
By the end of the module, students should:
Have a good understanding of the fundamental issues and challenges of machine learning: data, model selection, model complexity (DD1).
Develop an appreciation for what is involved in learning from data.
Understand a wide variety of machine learning algorithms (DD1).
Understand how to apply a variety of learning algorithms to data (DD2).
Understand how to perform evaluation of learning algorithms and model selection (DD3).
Have an understanding of the strengths and weaknesses of many popular machine learning approaches (DD1).
Appreciate the underlying mathematical relationships within and across machine learning algorithms and the paradigms of supervised and unsupervised learning (DD1).
Apply the algorithms to a real-world problem, optimize the models learned and report on the expected accuracy that can be achieved by applying the models (DD2,DD4).
Be able to design and implement various machine learning algorithms in a range of real-world applications (DD2).
Be able to write code in Python programming language to use machine learning algorithms (DD2).
Introduction to Machine Learning
Motivations of machine learning. Machine learning, artificial intelligence and big data. Applications of machine learning. Representation of input data. Machine learning process.
Exploratory data analysis
Data validation and cleansing, outlier and missing values detection. Data transformation. Data reduction. Sampling. Feature selection. Features extraction by filtering. Principal component analysis. Data discretization. Univariate analysis: graphical analysis, measures of central tendency, dispersion, relative location, heterogeneity, analysis of the empirical density. Bivariate analysis: graphical analysis, measures of correlation, contingency tables. Multivariate analysis: graphical analysis, measures of correlation.
Supervised learning: classification and regression
Taxonomy of supervised methods. Evaluation of classification models: holdout, cross-validation, confusion matrix and derived metrics, ROC curve, cumulative gain and lift. Treatment of categorical attributes. Nearest neighbor. Classification and regression trees: splitting, stopping and pruning. Bayesian methods: naive methods, Bayesian networks. Logistic regression. Neural networks: Rosenblatt perceptron, multi-level feed-forward networks. Support vector machines: structural risk minimization, maximal margin hyperplane for linear separation, nonlinear separation. Simple and multiple linear regression. Assumptions on residuals. Least square regression: normality and independence of residuals, significance of coefficients, analysis of variance, coefficients of determination and linear correlation, multicollinearity, confidence and prediction limits. Selection of predictive variables. Ridge regression. Generalized linear regression.
Motivation and evaluation of association rules. Single-dimension association rules. Apriori algorithm. Generation of frequent itemsets, generation of strong rules. General association rules.
Taxonomy of clustering methods. Affinity measures. Partition methods: K-means, K-medoids. Hierarchical methods: agglomerative methods, divisive methods. Evaluation of clustering models.
Applications and use cases
Introduction to Python programming language and its main libraries for machine learning (Scikit-learn, Keras). Applications in relational marketing using Python: lifetime value analysis, acquisition, retention, cross-selling, market basket analysis. Web mining. Social market analysis. Speech recognition. Text mining. Fraud and anomaly detection. Bioinformatics.
Machine Learning is a discipline at the interface between mathematics and computer science. Hence, a good background in probability, linear algebra and calculus is required, as well as a programming experience.
Modalità di valutazione
The final mark is determined by two components: 30% is due to individual assignments to be delivered at due dates during the course; 70% is due to final written test at each session, based on open and closed form answers. Theoretical questions aim at assessing knowledge acquisition with respect to tasks, methods and algorithms. Applied questions aim at assessing the ability to apply methods and algorithms, to properly understand the outputs and derive the implications for the application domain, to show programming skills in Python.
Notice that only students officially registered for a given session will be allowed to take the examination in that session. Late registrations will be rejected.
C. Vercellis, Business intelligence: data mining and optimization for decision making, Editore: Wiley, Anno edizione: 2009
T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning, Editore: Springer, Anno edizione: 2011
E. Alpaydin, Introduction to Machine Learning, Editore: MIT press, Anno edizione: 2014
A. Geron, Hands-On Machine Learning With Scikit-Learn and Tensorflow: Concepts, Tools, and Techniques to Build Intelligent Systems, Editore: O'Reilly, Anno edizione: 2017
Tipo Forma Didattica
Ore di attività svolte in aula
Ore di studio autonome
Laboratorio Di Progetto
Informazioni in lingua inglese a supporto dell'internazionalizzazione
Insegnamento erogato in lingua
Disponibilità di materiale didattico/slides in lingua inglese
Disponibilità di libri di testo/bibliografia in lingua inglese
Possibilità di sostenere l'esame in lingua inglese
Disponibilità di supporto didattico in lingua inglese