L'insegnamento prevede 1.0 CFU erogati con Didattica Innovativa come segue:
Corso di Studi
Codice Piano di Studio preventivamente approvato
Ing - Civ (Mag.)(ord. 270) - MI (495) GEOINFORMATICS ENGINEERING - INGEGNERIA GEOINFORMATICA
052537 - TECHNOLOGIES FOR INFORMATION SYSTEMS
Ing Ind - Inf (Mag.)(ord. 270) - BV (479) MANAGEMENT ENGINEERING - INGEGNERIA GESTIONALE
052537 - TECHNOLOGIES FOR INFORMATION SYSTEMS
Ing Ind - Inf (Mag.)(ord. 270) - MI (481) COMPUTER SCIENCE AND ENGINEERING - INGEGNERIA INFORMATICA
052537 - TECHNOLOGIES FOR INFORMATION SYSTEMS
We are in the era of large, decentralized, distributed environments where the amount of devices and data, and their heterogeneity, is getting out of control. Gartner reports that worldwide information volume is growing at a rate greater than 60% annually.
Organizations capture billions of bytes of data about their activities, users, operators, customers and suppliers, but their ability to collect, manage and interpret this information could be an obstacle to its use. The web is widening the range of data providers and consumers. Sensors, mobile devices and, in general, the IoT, produce further data that needs to be integrated and harmonized with the rest in order to produce value.
Decision-making is based on information, not just on data. More accurate information leads to better decisions and provides competitive advantages; hence processing, manipulating, and organizing data in a way that adds new knowledge has become a necessary issue.
The goal of the course is to enable students to master the engineering methods and processes that are necessary to manage modern information system, and especially data-intensive systems, to operate on large data collections and to understand the utility and methods of business analysis, obtaining useful knowledge to improve the decision-making process.
As a consequence, we expose the students to some of the most advanced methodologies adopted to understand the conceptual and technological problems encountered in the design and implementation of "data products": tangible results based on analyses for complex systems, concentrating, as raw material, on collections of data that must be integrated, organized and analyzed mainly through automatic tools.
In the belief that understanding is boosted by communication, this course is “Communication-Intensive”, in the sense that it aims at enhancing the students’ communication skills as applied to the technical content of the course, with a special emphasis on clarity and common ground (i.e. being able to take the interlocutors’ body of knowledge into proper account).
Risultati di apprendimento attesi
Expected learning outcomes
1. Knowledge and understanding
The students will learn how to:
Identify the phases of Big Data management, from the choice of the data sources to the production of a “data product”
Analyze the data sources and design a data integration system
Design a data warehouse
Identify the data quality problems encountered when managing heterogeneous collections of data
2. Applying knowledge and understanding
Given specific project cases, students will be able to:
Detail the corresponding requirements
Analyze and comment on specific conceptual and architectural choices
Apply the theory to decide the most appropriate ones
Develop data integration and data warehouse solutions fulfilling the high-level and design specifications.
3. Making judgements
Given a relatively complex problem, students will be able to:
Analyze and understand the goals, assumptions and requirements associated with that problem and model them
Define the type of architecture of the corresponding system
Identify the appropriate conceptual and logical design methodology
Estimate the system and the resources needed for its development
The students will learn to clearly explain a technological or methodological issue at the correct level of abstraction, considering the common ground with the interlocutor.
5. Lifelong learning skills
The students will learn how to develop a realistic data product
Information System Architectures and Heterogeneous Data Integration: structured and non-structured data (12 hrs lectures, 9 hrs exercises):
Introduction to the architectures of modern information systems
Basics of Data Integration: model heterogeneity, semantic heterogeneity at the schema level, heterogeneity at the data level
Dynamic data integration: the use of wrappers, mediators, meta-models, ontologies, , etc.
Lightweight data integration
The future of data integration
Data Quality (2 hrs lectures)
Data Warehousing and Analysis (10 hrs lectures, 9 hrs exercises):
Data Warehouse Architecture and querying
Data Warehouse Conceptual Design
Data Warehouse Logical Design
Introduction to exploratory data analysis, data mining and its applications.
Communication Skills (2 hrs lectures, 6 hrs exercises)
Basic elements of Communication Sciences (how to be clear)
CLASS HOURS: 26 lecture hours; 24 hours of exercise sessions
OPTIONAL (2-6 hrs): A series of seminars on advances in Data Management (e.g. Pervasive Systems, Data Personalization and Ranking, Ethical Issues, or others to be announced), NOT mandatory but highly recommended.
- The bibliography includes useful references and is by no means exhaustive of the topics covered in the course. More readings will be mentioned during the lectures. The students are advised to attend the lectures, ask the professor for explanations and read the articles. For each of the books, only the part included in the lectures is mandatory.
- An optional project can be chosen by the students who want to improve their mark. The objective of the projects is to help students in applying the approaches and principles we teach in class. The students can ask to be assigned projects at any time of the whole academic year. Project artifacts can be released at any time of the academic year. The evaluation of projects will be based on the produced artifacts and on a report.
The students are required to know the principles and methods of database design and technology, and the basic notions of the Entity-Relationship conceptual model and of the Relational Data Model along with its languages. The exams needed to acquire these notions are Data bases I and Data bases II (the latter may be attended in the same semester)
Modalità di valutazione
The assessment will be based on a written exam and a homework communication assignment. The exam is considered as passed if both parts have been passed. If the student does not pass one of these two parts in the same exam call, the mark of the passed part will be saved; in a future call, the student will need to retake only the failed part.
The written exam consists of (i) an exercise/design part (open-book), consisting in the design of an integration system, or of a data warehouse; It will assign up to 22 points and will be considered sufficient when the score will be equal or higher than 13; (ii) a theoretical part (closed-book), in which the students will be asked to explain clearly, in their own words, two of the subjects illustrated during the lectures; this part will assign up to 10 points, and will be considered sufficient when the score is equal or higher than 5. The scores of the two written parts will be summed up to compute the total score. Students can take the written part at any exam session during the year.
The homework communication assignment requires the students to summarize a technical text, having in mind a non-technical reader. The assignment will be evaluated by both the communication instructor and professor Tanca. The evaluation will be “pass/non-pass”; in case of excellence, one additional point will be assigned to the student in the overall score of the exam.
It is also possible for the students to increase their mark by up to 2 points by producing an optional project. If the student has requested to produce a project, this can be presented to the professor and to the teaching assistant at any time after the exam has been passed, and after that the total mark will be entered at the earliest possible registration date. 30 cum laude will be assigned when the total score is strictly higher than 31.
The following table provides a detailed overview of the elements that will be considered in the various assessment activities.
Type of assessment
Dublin descriptor TO DO
Exercises focusing on design aspects:
Design of a data integration system based on the requirements provided
Design of a data warehouse based on the requirements provided
Two theoretical questions on the course topics
Communication assignment: summarization of a research paper in words understandable by a non-technical reader.
1, 2, 3, 4, 5
Assessment of (optional) project
Assessment of the design and experimental work developed by students either individually or in groups
1, 2, 3, 4, 5
AnHai Doan, Alon Halevy, and Zachary Ives, Principles of Data Integration, Editore: Morgan Kaufmann, 1st edition, Anno edizione: 2012
M. Golfarelli, S. Rizzi, Data Warehouse Design: Modern Principles and Methodologies, Editore: McGraw Hill, Anno edizione: 2009, ISBN: 0071610391
Xin Luna Dong, Divesh Srivastava, Big Data Integration. Synthesis Lectures on Data Management, Editore: Morgan & Claypool Publishers, Anno edizione: 2015, ISBN: 978-1-62705-223-8
M. Lenzerini, Data Integration: A Theoretical Perspective, Proceedings of ACM PODS, pp. 233-246, Editore: ACM, Anno edizione: 2002, ISBN: 1-58113-507-6 Note:
This is a scientific paper published in a volume of Conference proceedings
Clement T. Yu, Weiyi Meng, Principles of Database Query Processing for Advanced Applications , Editore: Morgan Kaufmann, Anno edizione: 1998, ISBN: 1558604340 Note:
(in The Morgan Kaufmann Series in Data Management Systems)
Roberto De Virgilio, Fausto Giunchiglia, Letizia Tanca (Eds.), Semantic Web Information Management - A Model-Based Perspective, Editore: Springer Verlag, Anno edizione: 2009, ISBN: 978-3-642-04328-4
Pang-Ning Tan, Michael Steinbach, Vipin Kumar, Introduction to Data Mining , Editore: Addison-Wesley, Anno edizione: 2006, ISBN: 0321321367 http://www-users.cs.umn.edu/~kumar/dmbook/index.php Note:
The web site contains a lot of interesting material
Nessun software richiesto
Tipo Forma Didattica
Ore di attività svolte in aula
Ore di studio autonome
Laboratorio Di Progetto
Informazioni in lingua inglese a supporto dell'internazionalizzazione
Insegnamento erogato in lingua
Disponibilità di materiale didattico/slides in lingua inglese
Disponibilità di libri di testo/bibliografia in lingua inglese
Possibilità di sostenere l'esame in lingua inglese
Disponibilità di supporto didattico in lingua inglese