Objectives
Is it possible to extract useful knowledge for decision making from the huge amount of data available in the data warehouses of companies and public administrations?
Business Intelligence and big data analytics is a broad category of methods and technologies for gathering, providing access to, and analyzing data for the purpose of helping enterprise users make better business decisions. The term implies having a comprehensive knowledge of all factors that affect a business, such as customers, competitors, business partners, economic environment, and internal operations, therefore enabling optimal decisions to be made.
This course provides students with a detailed coverage and a practical guidance to mathematical models and analysis methodologies of business intelligence and big data analytics. It covers all the hot topics such as big data revolution, data warehousing, data mining and its applications, machine learning. It provides a systematic and rigorous treatment of each concept, combined by an extensive use of examples and numerous real-life case studies.
Syllabus
Business Intelligence and Big Data Analytics
Effective and timely decisions. Data, information and knowledge. Development of business intelligence and big data architectures. Decision support systems. Decision-making process. Data warehousing. Data quality. OLAP and multidimensional analysis. Data mining, classical statistics and OLAP. Applications of data mining. Representation of input data. Data mining process.
Data preparation and exploratory analysis
Data validation. Data transformation. Feature extraction. Data reduction. Sampling. Feature selection. Principal component analysis. Data discretization. Univariate analysis: graphical analysis, measures of central tendency, dispersion, relative location, identification of outliers, measures of heterogeneity, analysis of the empirical density. Bivariate analysis: graphical analysis, measures of correlation, contingency tables. Multivariate analysis: graphical analysis, measures of correlation.
Regression
Structure of regression models. Simple linear regression. Multiple linear regression. Assumptions on the residuals.
Treatment of categorical attributes. Ridge regression. Generalized linear regression. Validation of regression models: normality and independence of the residuals, significance of the coefficients, analysis of variance, coefficient of determination, coefficient of linear correlation, multicollinearity, confidence and prediction limits. Selection of predictive variables.
Classification
Taxonomy of classification models. Evaluation of classification models: holdout method, repeated random sampling, cross-validation, confusion matrices, ROC curve charts, cumulative gain and lift charts. Classification trees: splitting rules, stopping criteria and pruning rules. Bayesian methods: naive Bayesian classifiers, Bayesian networks. Logistic regression. Neural networks: Rosenblatt perceptron, multi-level feed-forward networks. Support vector machines: structural risk minimization, maximal margin hyperplane for linear separation, nonlinear separation.
Association rules
Motivation and evaluation of association rules. Single-dimension association rules. Apriori algorithm. Generation of frequent itemsets, generation of strong rules. General association rules.
Clustering
Taxonomy of clustering methods. Affinity measures. Partition methods: K-means, K-medoids. Hierarchical methods: agglomerative methods, divisive methods. Evaluation of clustering models.
Applications and business case studies
Applications in relational marketing: lifetime value analysis, acquisition, retention, cross-selling and up-selling, market basket analysis. Web mining. Social market analysis. Text mining. Fraud and anomaly detection.
|