FIRST PART: MODEL IDENTIFICATION
OBJECTIVES AND CONTENTS
The goal of the course is to provide the background for advanced modelling and data analysis, together with Kalman Filter techniques for parameters and virtual sensors estimation. The course has both a theoretical and a practical flavour, and is focused on the following topics: Stationary stochastic processes generated as output of dynamic systems. ARMA and ARMAX models. Prediction. Non-parametric models based on the spectral characteristics of a process. Estimation methods based on minimum prediction error. Model complexity analysis and parameters identification. Virtual sensors: Kalman Filter; Extended Kalman Filter for gray-box parameters identification.
DESCRIPTION OF THE CONTENTS:
Introduction to the concept of model identification: Data-based modelling; Black-Box and Gray-Box models
The mathematical framework: stochastic processes: Basic features (mean value; covariance function; spectrum); Practical estimation from measured data
Classes of Black-box linear models: AR/MA/ARMA; ARMAX; Analysis of stochastic processes with ARMA/ARMAX structure
The concept of optimal prediction: Canonical form; Optimality; Optimal prediction for ARMA/ARMAX processes
Identification from data of ARX/ARMAX models: LS identification; ML identification; Optimality; Design of Experiment; Optimal choice of model classes
Kalman filtering: The concept of SW-sensing; Linear Kalman filtering; Extended Kalman filter; Kalman filter for Gray-box identification
Data-preprocessing: Trend removal; Period-components removal; Missing-data
Practical examples - System Identification using Matlab
SECOND PART: DATA MINING
OBJECTIVES AND CONTENTS
The goal of the course is to provide the techniques for analysing massive amounts of data, discovering hidden relationships that can be useful to achieve deeper insights into an investigated domain and to predict future developments. The course has both a theoretical and a practical flavour, and is focused on the following topics: the data mining process, data preparation and exploratory analysis, regression, classification, association rules, clustering.
DESCRIPTION OF THE CONTENTS
Definition of data mining. Models and methods for data mining. Data mining, classical statistics and OLAP.
Applications of data mining. Representation of input data. Data mining process. Analysis methodologies.
Data preparation and exploratory analysis
Data validation. Incomplete data. Data transformation. Standardization. Feature extraction. Data reduction. Sampling. Feature selection. Principal component analysis. Data discretization. Graphical analysis of categorical and numerical attributes. Measures of central tendency, dispersion, relative location for numerical attributes. Identification of outliers. Measures of heterogeneity for categorical attributes. Analysis of the empirical density. Summary statistics. Bivariate analysis. Measures of correlation. Contingency tables. Multivariate analysis.
Structure of regression models. Simple linear regression. Multiple linear regression. Assumptions on the residuals. Treatment of categorical predictive attributes. Ridge regression. Generalized linear regression. Normality and independence of the residuals. Significance of the coefficients. Analysis of variance. Coefficient of determination. Coefficient of linear correlation. Multicollinearity of the independent variables. Confidence and prediction limits. Selection of predictive variables.
Classification problems. Evaluation of classification models: Holdout method, repeated random sampling, cross-validation, confusion matrices, ROC curve charts, cumulative gain and lift charts. Classification trees. Bayesian methods. Logistic regression. Neural networks. Support vector machines: structural risk minimization, maximal margin hyperplane for linear separation, nonlinear separation.
Motivation and structure of association rules. Single-dimension association rules. Apriori algorithm. Generation of frequent itemsets. Generation of strong rules. General association rules.
Taxonomy of clustering methods. Affinity measures. Partition methods. K-means algorithm. K-medoids algorithm. Hierarchical methods. Agglomerative hierarchical methods. Divisive hierarchical methods. Evaluation of clustering models.