Risorse bibliografiche
Risorsa bibliografica obbligatoria
Risorsa bibliografica facoltativa
Scheda Riassuntiva
Anno Accademico 2019/2020
Scuola Scuola di Ingegneria Industriale e dell'Informazione
Docente Tanca Letizia
Cfu 5.00 Tipo insegnamento Monodisciplinare
Didattica innovativa L'insegnamento prevede  1.0  CFU erogati con Didattica Innovativa come segue:
  • Soft Skills

Corso di Studi Codice Piano di Studio preventivamente approvato Da (compreso) A (escluso) Insegnamento

Obiettivi dell'insegnamento

We are in the era of large, decentralized, distributed environments where the amount of devices and data, and their heterogeneity, is getting out of control. Gartner reports that worldwide information volume is growing at a  rate  greater than 60% annually.

Organizations capture billions of bytes of data about their activities, users, operators, customers and suppliers, but their ability to collect, manage and interpret this information could be an obstacle to its use. The web is widening the range of data providers and consumers. Sensors, mobile devices and, in general, the IoT, produce further data that needs to be integrated and harmonized with the rest in order to produce value.

Decision-making is based on information, not just on data. More accurate information leads to better decisions and provides competitive advantages; hence processing, manipulating, and organizing data in a way that adds new knowledge has become a necessary issue.

The goal of the course is to enable students to master the engineering methods and processes that are necessary to manage modern information system, and especially data-intensive systems, to operate on large data collections and to understand the utility and methods of business analysis, obtaining useful knowledge to improve the decision-making process.

As a consequence,  we expose the students to some of the most advanced methodologies adopted to understand the conceptual and technological problems encountered in the design and implementation of  "data products": tangible results based on analyses for complex systems, concentrating, as raw material, on collections of data that must be integrated, organized and analyzed mainly through automatic tools.

In the belief that understanding is boosted by communication, this course is “Communication-Intensive”, in the sense that it aims at enhancing the students’ communication skills as applied to the technical content of the course, with a special emphasis on clarity and common ground (i.e. being able to take the interlocutors’ body of knowledge into proper account).

Risultati di apprendimento attesi

Dublin Descriptors

Expected learning outcomes

1. Knowledge and understanding

 The students will learn how to:

  • Identify the phases of Big Data management, from the choice of the data sources to the production of a “data product”
  • Analyze the data sources and design a data integration system
  • Design a data warehouse
  • Identify the data quality problems encountered when managing heterogeneous collections of data

2. Applying knowledge and understanding

 Given specific project cases, students will be able to:

  • Detail the corresponding requirements
  • Analyze and comment on specific conceptual and architectural choices
  • Apply the theory to decide the most appropriate ones
  • Develop data integration and data warehouse solutions fulfilling the high-level and design specifications.

3. Making judgements

 Given a relatively complex problem, students will be able to:

  • Analyze and understand the goals, assumptions and requirements associated with that problem and model them
  • Define the type of architecture of the corresponding system
  • Identify the appropriate conceptual and logical design methodology
  • Estimate the system and the resources needed for its development 

4. Communication

 The students will learn to clearly explain a technological or methodological issue at the correct level of abstraction, considering the common ground with the interlocutor.

5. Lifelong learning skills

  The students will learn how to develop a realistic data product

Argomenti trattati

Information System Architectures and  Heterogeneous Data Integration: structured and non-structured data (12 hrs lectures, 9 hrs exercises):

  • Introduction to the architectures of modern information systems
  • Basics of Data Integration: model heterogeneity, semantic heterogeneity at the schema level, heterogeneity at the data level
  • Dynamic data integration: the use of wrappers, mediators, meta-models, ontologies, , etc.
  • Lightweight data integration
  • The future of data integration

Data Quality (2 hrs lectures)

Data Warehousing and Analysis (10 hrs lectures, 9 hrs exercises):

  • Data Warehouse Architecture and querying
  • Data Warehouse Conceptual  Design
  • Data Warehouse Logical  Design
  • Introduction to exploratory data analysis, data mining and its applications.

Communication Skills (2 hrs lectures, 6 hrs exercises)

  • Basic elements of Communication Sciences (how to be clear)
  • Common-ground Theory
  • Writing skills

CLASS HOURS: 26 lecture hours; 24 hours of exercise sessions

OPTIONAL (2-6 hrs): A series of seminars on advances in Data Management (e.g. Pervasive Systems, Data Personalization and Ranking, Ethical Issues, or others to be announced), NOT mandatory but highly recommended.


-  The  bibliography includes useful references and is by no means exhaustive of the topics covered in the course. More readings will be mentioned during the lectures. The students are advised to attend the lectures, ask the professor for explanations and read the articles. For each of the books, only the part included in the lectures is mandatory.

- An optional project can be chosen by the students who want to improve their mark. The objective of the projects is to help students in applying the approaches and principles we teach in class. The students can ask to be assigned projects at any time of the whole academic year. Project artifacts can be released at any time of the academic year. The evaluation of projects will be based on the produced artifacts and on a report.


The students are required to know the principles and methods of database design and technology, and the basic notions of the Entity-Relationship conceptual model and of the Relational Data Model along with its languages. The exams needed to acquire these notions are Data bases I and Data bases II (the latter may be attended in the same semester)

Modalità di valutazione

The assessment will be based on a written exam and a homework communication assignment. The exam is considered as passed if both parts have been passed. If the student does not pass one of these two parts in the same exam call, the mark of the passed part will be saved; in a future call, the student will need to retake only the failed part.

The written exam consists of (i) an exercise/design part (open-book), consisting in the design of an integration system, or of a data warehouse; It will assign up to 22 points and will be considered sufficient when the score will be equal or higher than 13;  (ii) a theoretical part (closed-book), in which the students will be asked to explain clearly, in their own words, two of the subjects illustrated during the lectures; this part will assign up to 10 points, and will be considered sufficient when the score is equal or higher than 5. The scores of the two written parts will be summed up to compute the total score. Students can take the written part at any exam session during the year. 

The homework communication assignment requires the students to summarize a technical text, having in mind a non-technical reader. The assignment will be evaluated by both the communication instructor and professor Tanca. The evaluation will be “pass/non-pass”; in case of excellence, one additional  point  will be assigned to the student in the overall score of the exam.

It is also possible for the students to increase their mark by up to 2 points by producing an optional project. If  the student has requested to produce a project, this can  be presented to the professor and to the teaching assistant at any time after the exam has been passed, and after that the total mark will be entered at the earliest possible registration date. 30 cum laude will be assigned when the total score is strictly higher than 31.

The following table provides a detailed overview of the elements that will be considered in the various assessment activities.

Type of assessment


Dublin descriptor TO DO

Written test

 Exercises focusing on design aspects: 

  • Design of a data integration system based on the requirements provided


  • Design of a data warehouse based on the requirements provided

 Two theoretical questions on the course topics 

 Communication assignment: summarization of  a research paper in words understandable by a non-technical reader.






1, 2, 3, 4, 5

Assessment of (optional) project

 Assessment of the design and experimental work developed by students either individually or in groups

1, 2, 3, 4, 5

Risorsa bibliografica obbligatoriaAnHai Doan, Alon Halevy, and Zachary Ives, Principles of Data Integration, Editore: Morgan Kaufmann, 1st edition, Anno edizione: 2012
Risorsa bibliografica obbligatoriaM. Golfarelli, S. Rizzi, Data Warehouse Design: Modern Principles and Methodologies, Editore: McGraw Hill, Anno edizione: 2009, ISBN: 0071610391
Risorsa bibliografica facoltativaXin Luna Dong, Divesh Srivastava, Big Data Integration. Synthesis Lectures on Data Management, Editore: Morgan & Claypool Publishers, Anno edizione: 2015, ISBN: 978-1-62705-223-8
Risorsa bibliografica facoltativaM. Lenzerini, Data Integration: A Theoretical Perspective, Proceedings of ACM PODS, pp. 233-246, Editore: ACM, Anno edizione: 2002, ISBN: 1-58113-507-6

This is a scientific paper published in a volume of Conference proceedings

Risorsa bibliografica facoltativaClement T. Yu, Weiyi Meng, Principles of Database Query Processing for Advanced Applications , Editore: Morgan Kaufmann, Anno edizione: 1998, ISBN: 1558604340

(in The Morgan Kaufmann Series in Data Management Systems)

Risorsa bibliografica facoltativaRoberto De Virgilio, Fausto Giunchiglia, Letizia Tanca (Eds.), Semantic Web Information Management - A Model-Based Perspective, Editore: Springer Verlag, Anno edizione: 2009, ISBN: 978-3-642-04328-4
Risorsa bibliografica facoltativaPang-Ning Tan, Michael Steinbach, Vipin Kumar, Introduction to Data Mining , Editore: Addison-Wesley, Anno edizione: 2006, ISBN: 0321321367 http://www-users.cs.umn.edu/~kumar/dmbook/index.php

The web site contains a lot of interesting material

Software utilizzato
Nessun software richiesto

Forme didattiche
Tipo Forma Didattica Ore di attività svolte in aula
Ore di studio autonome
Laboratorio Informatico
Laboratorio Sperimentale
Laboratorio Di Progetto
Totale 50:00 75:00

Informazioni in lingua inglese a supporto dell'internazionalizzazione
Insegnamento erogato in lingua Inglese
Disponibilità di materiale didattico/slides in lingua inglese
Disponibilità di libri di testo/bibliografia in lingua inglese
Possibilità di sostenere l'esame in lingua inglese
Disponibilità di supporto didattico in lingua inglese
schedaincarico v. 1.8.3 / 1.8.3
Area Servizi ICT