Risorse bibliografiche
Risorsa bibliografica obbligatoria
Risorsa bibliografica facoltativa
Scheda Riassuntiva
Anno Accademico 2016/2017
Tipo incarico Dottorato
Docente Cappiello Cinzia
Cfu 5.00 Tipo insegnamento Monodisciplinare

Corso di Dottorato Da (compreso) A (escluso) Insegnamento

Programma dettagliato e risultati di apprendimento attesi

The combination of data and technology is having a high impact on the way we live. The world is getting smarter thanks to the quantity of collected and analysed data. However, such data quantity can create a real value only if combined with data quality: good decisions and actions are the results of correct, reliable and complete data. Data errors, inconsistencies or delays often negatively affect the output of a process or of a software application.

The course introduces the basic concepts, models and techniques of the data quality. It aims to provide the tools to assess and improve the quality of data used in different applications and contexts in order to avoid errors and inefficiencies. Such issue is perceived as important in different fields and for different data sources (e.g., structured databases, logs, social media content, sensor values). For example, in the big data scenario where the big challenge is to transform the relevant data in good decisions, assessing data quality (DQ) can be a good starting point for identifying not significant information and for trying to obtain the maximum value from the available information. In fact, one of the main goals of Data Quality research is to assess and eventually increase the reliability and value of the data in use. In recent years, several comprehensive methodologies for the Data Quality management have been proposed. They include the techniques and procedures to analyze data quality problems, define Data Quality dimensions, measure and improve data quality levels.

This course aims to:

-        introduce the basic elements of Data Quality management;

-        provide an overview of the current techniques used to assess the most used data quality dimensions (i.e., accuracy, precision, completeness, timeliness and consistency) in different data sources. The course shows how the formulas and methods used for assessment depend on the type of data (e.g., numerical vs. text values, structured vs. unstructured data) and the type of data sources (e.g., traditional databases vs sensors);

-        present the techniques to detect errors and data quality anomalies in business processes;

-        illustrate the techniques to improve data quality levels. The course presents both value-based improvement (e.g., data cleaning) and process-based improvement techniques;

-        discuss the main data quality issues in data fusion: duplicate detection and conflict resolution;

-        describe the main data quality open issues in new field such as IOT and big data

Note Sulla Modalità di valutazione

The exam consists in the discussion of two or three research articles related to a specific topic.



Intervallo di svolgimento dell'attività didattica
Data inizio
Data termine

Calendario testuale dell'attività didattica

The course is organized in five lectures:

1st lecture - Data quality definition, data quality dimensions, data quality assessment

2nd lecture - Data quality interpretation and data quality improvement

3rd lecture - Data quality in business processes

4th lecture - Data stream management

5th lecture - Data quality issues in Web applications and the new challenges in the big data scenario


Lecture period: January 23rd - February 6th. The exam will be held on March 6th. 




Software utilizzato
Nessun software richiesto

Mix Forme Didattiche
Tipo Forma Didattica Ore didattiche
laboratorio informatico
laboratorio sperimentale
laboratorio di progetto

Informazioni in lingua inglese a supporto dell'internazionalizzazione
Insegnamento erogato in lingua Inglese

Note Docente
schedaincarico v. 1.6.9 / 1.6.9
Area Servizi ICT