logo-polimi
Loading...
Risorse bibliografiche
Risorsa bibliografica obbligatoria
Risorsa bibliografica facoltativa
Scheda Riassuntiva
Anno Accademico 2017/2018
Scuola Scuola di Ingegneria Industriale e dell'Informazione
Insegnamento 096049 - IDENTIFICAZIONE DEI MODELLI E DATA MINING [C.I.]
Docente Garatti Simone , Vercellis Carlo
Cfu 12.00 Tipo insegnamento Corso Integrato

Corso di Studi Codice Piano di Studio preventivamente approvato Da (compreso) A (escluso) Insegnamento
Ing Ind - Inf (Mag.)(ord. 270) - MI (471) BIOMEDICAL ENGINEERING - INGEGNERIA BIOMEDICA*AZZZZ051152 - DATA MINING
085811 - IDENTIFICAZIONE DEI MODELLI E DATA MINING [C.I.]
088779 - IDENTIFICAZIONE DEI MODELLI E ANALISI DEI DATI 2
096049 - IDENTIFICAZIONE DEI MODELLI E DATA MINING [C.I.]

Programma dettagliato e risultati di apprendimento attesi

FIRST PART: MODEL IDENTIFICATION

 

OBJECTIVES AND CONTENTS

The goal of the course is to provide the background for advanced modelling and data analysis, together with Kalman Filter techniques for parameters and virtual sensors estimation. The course has both a theoretical and a practical flavour, and is focused on the following topics: Stationary stochastic processes generated as output of dynamic systems. ARMA and ARMAX models. Prediction. Non-parametric models based on the spectral characteristics of a process. Estimation methods based on minimum prediction error. Model complexity analysis and parameters identification. Virtual sensors: Kalman Filter; Extended Kalman Filter for gray-box parameters identification.

 

DESCRIPTION OF THE CONTENTS:

Introduction to the concept of model identification: Data-based modelling; Black-Box and Gray-Box models

 

The mathematical framework: stochastic processes: Basic features (mean value; covariance function; spectrum); Practical estimation from measured data

 

Classes of Black-box linear models: AR/MA/ARMA; ARMAX; Analysis of stochastic processes with ARMA/ARMAX structure

 

The concept of optimal prediction: Canonical form; Optimality; Optimal prediction for ARMA/ARMAX processes

 

Identification from data of ARX/ARMAX models: LS identification; ML identification; Optimality; Design of Experiment; Optimal choice of model classes

 

Kalman filtering: The concept of SW-sensing; Linear Kalman filtering; Extended Kalman filter; Kalman filter for Gray-box identification

 

Data-preprocessing: Trend removal; Period-components removal; Missing-data

 

Practical examples - System Identification using Matlab

 

SECOND PART: DATA MINING

 

OBJECTIVES AND CONTENTS

The goal of the course is to provide the techniques for analysing massive amounts of data, discovering hidden relationships that can be useful to achieve deeper insights into an investigated domain and to predict future developments. The course has both a theoretical and a practical flavour, and is focused on the following topics: the data mining process, data preparation and exploratory analysis, regression, classification, association rules, clustering.

 

DESCRIPTION OF THE CONTENTS

Data mining

Definition of data mining. Models and methods for data mining. Data mining, classical statistics and OLAP.

Applications of data mining. Representation of input data. Data mining process. Analysis methodologies.

Data preparation and exploratory analysis

Data validation. Incomplete data. Data transformation. Standardization. Feature extraction. Data reduction. Sampling. Feature selection. Principal component analysis. Data discretization. Graphical analysis of categorical and numerical attributes. Measures of central tendency, dispersion, relative location for numerical attributes. Identification of outliers. Measures of heterogeneity for categorical attributes. Analysis of the empirical density. Summary statistics. Bivariate analysis. Measures of correlation. Contingency tables. Multivariate analysis.

Regression

Structure of regression models. Simple linear regression. Multiple linear regression. Assumptions on the residuals. Treatment of categorical predictive attributes. Ridge regression. Generalized linear regression. Normality and independence of the residuals. Significance of the coefficients. Analysis of variance. Coefficient of determination. Coefficient of linear correlation. Multicollinearity of the independent variables. Confidence and prediction limits. Selection of predictive variables.

Classification

Classification problems. Evaluation of classification models: Holdout method, repeated random sampling, cross-validation, confusion matrices, ROC curve charts, cumulative gain and lift charts. Classification trees. Bayesian methods. Logistic regression. Neural networks. Support vector machines: structural risk minimization, maximal margin hyperplane for linear separation, nonlinear separation.

Association rules

Motivation and structure of association rules. Single-dimension association rules. Apriori algorithm. Generation of frequent itemsets. Generation of strong rules. General association rules.

Clustering

Taxonomy of clustering methods. Affinity measures. Partition methods. K-means algorithm. K-medoids algorithm. Hierarchical methods. Agglomerative hierarchical methods. Divisive hierarchical methods. Evaluation of clustering models.


Note Sulla Modalità di valutazione

-        written exam (questions of theory and numerical problems)

-        duration: 3hours (including Model Identification and Data-Mining)

 


Bibliografia
Risorsa bibliografica facoltativaC. vercellis, Business intelligence: data mining and optimization for decision making, Editore: Wiley
Risorsa bibliografica facoltativaS. Bittanti, Teoria della predizione e del filtraggio, Editore: Pitagora
Risorsa bibliografica facoltativaS. Bittanti, Identificazione dei modelli e sistemi adattativi, Editore: Pitagora
Risorsa bibliografica facoltativaT. Söderström, P. Stoica, System Identification, Editore: Prentice Hall

Software utilizzato
Nessun software richiesto

Mix Forme Didattiche
Tipo Forma Didattica Ore didattiche
lezione
72.0
esercitazione
48.0
laboratorio informatico
0.0
laboratorio sperimentale
0.0
progetto
0.0
laboratorio di progetto
0.0

Informazioni in lingua inglese a supporto dell'internazionalizzazione
Insegnamento erogato in lingua Italiano
Disponibilità di materiale didattico/slides in lingua inglese
Disponibilità di libri di testo/bibliografia in lingua inglese
schedaincarico v. 1.8.3 / 1.8.3
Area Servizi ICT
09/12/2023