L'insegnamento prevede 1.0 CFU erogati con Didattica Innovativa come segue:
Soft Skills
Corso di Studi
Codice Piano di Studio preventivamente approvato
Da (compreso)
A (escluso)
Insegnamento
Ing - Civ (Mag.)(ord. 270) - MI (495) GEOINFORMATICS ENGINEERING - INGEGNERIA GEOINFORMATICA
*
A
ZZZZ
052537 - TECHNOLOGIES FOR INFORMATION SYSTEMS
Ing Ind - Inf (Mag.)(ord. 270) - BV (479) MANAGEMENT ENGINEERING - INGEGNERIA GESTIONALE
*
A
ZZZZ
052537 - TECHNOLOGIES FOR INFORMATION SYSTEMS
Ing Ind - Inf (Mag.)(ord. 270) - MI (481) COMPUTER SCIENCE AND ENGINEERING - INGEGNERIA INFORMATICA
*
A
ZZZZ
052537 - TECHNOLOGIES FOR INFORMATION SYSTEMS
Obiettivi dell'insegnamento
We are in the era of large, decentralized, distributed environments where the amount of devices and data, and their heterogeneity, is getting out of control. Gartner reports that worldwide information volume is growing at a rate between 40% and 60% annually.
Organizations capture billions of bytes of data about their customers, suppliers and operations, but their ability to collect, manage and interpret this information could be an obstacle to its use. The web is widening the range of data providers and consumers. Sensors, mobile devices and, in general, the IoT, produce further data that needs to be integrated and armonized with the rest in order to produce value.
Decision-making is based on information, not in just on data. More accurate information leads to better decisions and gives competitive advantages to companies. Hence, processing, manipulating, and organizing data in a way that adds new knowledge to the person or organization receiving it has become a necessary issue.
The goal of the course is to enable students to master the engineering methods and processes that are necessary to manage modern information system, and especially data-intensive systems, to operate on large data collections and to understand the utility and methods of business analysis, obtaining useful knowledge to improve the decision-making process.
As a consequence, we expose the students to some of the most advanced methodologies adopted to understand the conceptual and technological problems encountered in the design and implementation of "data products": tangible results based on analyses for complex systems, concentrating, as raw material, on collections of data that must be integrated, organized and analyzed mainly through automatic tools.
Risultati di apprendimento attesi
Dublin Descriptors
Expected learning outcomes
Knowledge and understanding
Students will learn how to:
Identify the phases of Big Data management, from the choice of the data sources to the production of a “data product”
Analyze the data sources and design a data integration system
Design a data warehouse
Identify the data quality problems encountered when managing heterogeneous collections of data
Applying knowledge and understanding
Given specific project cases, students will be able to:
Detail the corresponding requirements
Analyze and comment on specific conceptual and architectural choices
Apply the theory to decide the most appropriate ones
Develop data integration and data warehouse solutions fulfilling the high-level and design specifications.
Making judgements
Given a relatively complex problem, students will be able to:
Analyze and understand the goals, assumptions and requirements associated with that problem and model them
Define the type of architecture of the corresponding system
Identify the appropriate conceptual and logical design methodology
Estimate the size of the system and the resources needed for its development
Communication
Students will learn to:
Clearly explain a technological or methodological issue at the correct level of abstraction, considering the common ground with the interlocutor
(For some students) describe, by means of a report or a public presentation, the work done during their project.
Lifelong learning skills
The students will learn how to develop a realistic data product
Argomenti trattati
Information System Architectures and Heterogeneous Data Integration: structured and non-structured data (10 hrs lectures, 8 hrs exercises):
Introduction to the architectures of modern information systems
Basics of Data Integration: model heterogeneity, semantic heterogeneity at the schema level, heterogeneity at the data level
Dynamic data integration: the use of wrappers, mediators, meta-models, ontologies, , etc.
Lightweight data integration: Mashup systems
The future of data integratiuon
Data Warehousing and Analysis (10 hrs lectures, 8 hrs exercises):
Data Warehouse Architecture and querying
Data Warehouse Conceptual Design
Data Warehouse Logical Design
Introduction to exploratory data analysis, data mining and its applications.
Advanced topics (4 hrs lectures, 2-3 hrs optional): Data quality, Time in Data Management and Computer Science, Data Management in Pervasive Systems others to be announced
Communication Skills (6 hrs lectures, 4 hrs exercises)
Basic elements of Communication Sciences (how to be clear)
Common-ground Theory
Presentation skills
CLASS HOURS: 30 lecture hours; 20 hours of exercise sessions
NOTES:
- The bibliography includes useful references and is by no means exhaustive of the topics covered in the course. More readings will be mentioned during the lectures. The students are advised to attend the lectures, ask the professor for explanations and read the articles. For each of the books, only the part included in the lectures is mandatory.
- An optional project can be chosen by the students who want to improve their mark. The objective of the projects is to help students in applying the approaches and principles we teach in class. The students can ask to be assigned projects at any time of the whole academic year. Project artifacts can be released at any time of the academic year. The evaluation of projects will be based on the produced artifacts and on a report.
Prerequisiti
Students are required to know the principles and methods of database design and technology, and the basic notions of the Entity-Relationship conceptual model and of the Relational Data Model along with its languages. The exams needed to acquire these notions are Data bases I and Data bases II (the latter may be attended in the same semester)
Modalità di valutazione
The assessment will be based on a written exam at the end of the course and on the (optional) projects developed. If the student does not require to produce a project, the mark will be assigned at the first possible registration date; if the student has requested to produce a project, the project will be presented to the professor and to the teaching assistant, and after that the total mark will be assigned at the earliest possible registration date.
The written exam consists of (i) An exercise/design part (open-book), consisting in the design of an integration system, or of a data warehouse; It will assign up to 22 points and will be considered sufficient when the score will be equal or higher than 13; and of (ii) A theoretical part (closed-book), in which the students will be asked to explain, in terms most appropriate to an audience assigned in the specific case, two of the subjects illustrated during the lectures; this part will assign up to 10 points, and will be considered sufficient when the score will be equal or higher than 5. Both the quality of content and that of communication will be part of the evaluation
The scores of the two written parts will be summed up to compute the total score. The project part will assign from 0 to 2 points, to be added to the total mark. Students can take the written part at any exam session during the year. 30 cum laude will be assigned when the total score is strictly higher than 31.
The following table provides a detailed overview of the elements that will be considered in the various assessment activities.
Type of assessment
Description
Dublin descriptor TO DO
Written test
Exercises focusing on design aspects
· Design of a data integration system based on the requirements provided
or
· Design of a data warehouse based on the requirements provided
· Written explanation of one of the course topics with special attention to the communication with a specific (assigned) kind of audience
1,2,3
1, 2, 3, 4, 5
Assessment of (optional) project
Assessment of the design and experimental work developed by students either individually or in groups
1, 2, 3, 4, 5
Bibliografia
AnHai Doan, Alon Halevy, and Zachary Ives, Principles of Data Integration, Editore: Morgan Kaufmann, 1st edition, Anno edizione: 2012
M. Golfarelli, S. Rizzi, Data Warehouse Design: Modern Principles and Methodologies, Editore: McGraw Hill, Anno edizione: 2009, ISBN: 0071610391
Xin Luna Dong, Divesh Srivastava, Big Data Integration. Synthesis Lectures on Data Management, Editore: Morgan & Claypool Publishers, Anno edizione: 2015, ISBN: 978-1-62705-223-8
M. Lenzerini, Data Integration: A Theoretical Perspective, Proceedings of ACM PODS, pp. 233-246, Editore: ACM, Anno edizione: 2002, ISBN: 1-58113-507-6 Note:
This is a scientific paper published in a volume of Conference proceedings
Clement T. Yu, Weiyi Meng, Principles of Database Query Processing for Advanced Applications , Editore: Morgan Kaufmann, Anno edizione: 1998, ISBN: 1558604340 Note:
(in The Morgan Kaufmann Series in Data Management Systems)
Roberto De Virgilio, Fausto Giunchiglia, Letizia Tanca (Eds.), Semantic Web Information Management - A Model-Based Perspective, Editore: Springer Verlag, Anno edizione: 2009, ISBN: 978-3-642-04328-4
Pang-Ning Tan, Michael Steinbach, Vipin Kumar, Introduction to Data Mining , Editore: Addison-Wesley, Anno edizione: 2006, ISBN: 0321321367 http://www-users.cs.umn.edu/~kumar/dmbook/index.php Note:
The web site contains a lot of interesting material
F. Colace, M. De Santo, V. Moscato, A. Picariello, F. A. Schreiber, L. Tanca (Eds), Data Management in Pervasive Systems, Editore: Springer Cham Heidelberg New York, Anno edizione: 2015, ISBN: 978-3-319-20061-3
Schreiber F.A., Is Time a Real Time? An Overview of Time Ontology in Informatics , Editore: Springer, Anno edizione: 1994 Note:
in W.A. Halang, A.D. Stoyenko (Eds.) - Real Time Computing - , Springer Verlag NATO-ASI Vol. F127, pp. 283-307
Snodgrass R., Ahn I., Temporal Databases - IEEE Computer, Editore: IEEE, Anno edizione: 1986, Fascicolo: vol. 19, n. 9, pp. 35-42
Software utilizzato
Nessun software richiesto
Forme didattiche
Tipo Forma Didattica
Ore di attività svolte in aula
(hh:mm)
Ore di studio autonome
(hh:mm)
Lezione
30:00
45:00
Esercitazione
20:00
30:00
Laboratorio Informatico
0:00
0:00
Laboratorio Sperimentale
0:00
0:00
Laboratorio Di Progetto
0:00
0:00
Totale
50:00
75:00
Informazioni in lingua inglese a supporto dell'internazionalizzazione
Insegnamento erogato in lingua
Inglese
Disponibilità di materiale didattico/slides in lingua inglese
Disponibilità di libri di testo/bibliografia in lingua inglese
Possibilità di sostenere l'esame in lingua inglese
Disponibilità di supporto didattico in lingua inglese