logo-polimi
Loading...
Risorse bibliografiche
Risorsa bibliografica obbligatoria
Risorsa bibliografica facoltativa
Scheda Riassuntiva
Anno Accademico 2018/2019
Scuola Scuola di Ingegneria Industriale e dell'Informazione
Insegnamento 052537 - TECHNOLOGIES FOR INFORMATION SYSTEMS
Docente Tanca Letizia
Cfu 5.00 Tipo insegnamento Monodisciplinare
Didattica innovativa L'insegnamento prevede  1.0  CFU erogati con Didattica Innovativa come segue:
  • Soft Skills

Corso di Studi Codice Piano di Studio preventivamente approvato Da (compreso) A (escluso) Insegnamento
Ing - Civ (Mag.)(ord. 270) - MI (495) GEOINFORMATICS ENGINEERING - INGEGNERIA GEOINFORMATICA*AZZZZ052537 - TECHNOLOGIES FOR INFORMATION SYSTEMS
Ing Ind - Inf (Mag.)(ord. 270) - BV (479) MANAGEMENT ENGINEERING - INGEGNERIA GESTIONALE*AZZZZ052537 - TECHNOLOGIES FOR INFORMATION SYSTEMS
Ing Ind - Inf (Mag.)(ord. 270) - MI (481) COMPUTER SCIENCE AND ENGINEERING - INGEGNERIA INFORMATICA*AZZZZ052537 - TECHNOLOGIES FOR INFORMATION SYSTEMS

Obiettivi dell'insegnamento

We are in the era of large, decentralized, distributed environments where the amount of devices and data, and their heterogeneity, is getting out of control. Gartner reports that worldwide information volume is growing at a  rate  between 40% and 60% annually.

Organizations capture billions of bytes of data about their customers, suppliers and operations, but their ability to collect, manage and interpret this information could be an obstacle to its use. The web is widening the range of data providers and consumers. Sensors, mobile devices and, in general, the IoT, produce further data that needs to be integrated and armonized with the rest in order to produce value.

Decision-making is based on information, not in just on data. More accurate information leads to better decisions and gives competitive advantages to companies. Hence, processing, manipulating, and organizing data in a way that adds new knowledge to the person or organization receiving it has become a necessary issue.

The goal of the course is to enable students to master the engineering methods and processes that are necessary to manage modern information system, and especially data-intensive systems, to operate on large data collections and to understand the utility and methods of business analysis, obtaining useful knowledge to improve the decision-making process.

As a consequence,  we expose the students to some of the most advanced methodologies adopted to understand the conceptual and technological problems encountered in the design and implementation of  "data products": tangible results based on analyses for complex systems, concentrating, as raw material, on collections of data that must be integrated, organized and analyzed mainly through automatic tools.


Risultati di apprendimento attesi

Dublin Descriptors

Expected learning outcomes

Knowledge and understanding

Students will learn how to:

  • Identify the phases of Big Data management, from the choice of the data sources to the production of a “data product”
  • Analyze the data sources and design a data integration system
  • Design a data warehouse
  • Identify the data quality problems encountered when managing heterogeneous collections of data

Applying knowledge and understanding

Given specific project cases, students will be able to:

  • Detail the corresponding requirements
  • Analyze and comment on specific conceptual and architectural choices
  • Apply the theory to decide the most appropriate ones
  • Develop data integration and data warehouse solutions fulfilling the high-level and design specifications.

Making judgements

Given a relatively complex problem, students will be able to:

  • Analyze and understand the goals, assumptions and requirements associated with that problem and model them
  • Define the type of architecture of the corresponding system
  • Identify the appropriate conceptual and logical design methodology
  • Estimate the size of the system and the resources needed for its development 

Communication

Students will learn to:      

  • Clearly explain a technological or methodological issue at the correct level of abstraction, considering the common ground with the interlocutor
  • (For some students) describe, by means of a report or a public presentation, the work done during their project.

Lifelong learning skills

  • The students will learn how to develop a realistic data product

Argomenti trattati

Information System Architectures and  Heterogeneous Data Integration: structured and non-structured data (10 hrs lectures, 8 hrs exercises):

  • Introduction to the architectures of modern information systems
  • Basics of Data Integration: model heterogeneity, semantic heterogeneity at the schema level, heterogeneity at the data level
  • Dynamic data integration: the use of wrappers, mediators, meta-models, ontologies, , etc.
  • Lightweight data integration: Mashup systems
  • The future of data integratiuon

Data Warehousing and Analysis (10 hrs lectures, 8 hrs exercises):

  • Data Warehouse Architecture and querying
  • Data Warehouse Conceptual  Design
  • Data Warehouse Logical  Design
  • Introduction to exploratory data analysis, data mining and its applications.

Advanced topics (4 hrs lectures, 2-3 hrs optional): Data quality, Time in Data Management and Computer Science, Data Management in Pervasive Systems others to be announced

Communication Skills (6 hrs lectures, 4 hrs exercises)

  • Basic elements of Communication Sciences (how to be clear)
  • Common-ground Theory
  • Presentation skills

CLASS HOURS: 30 lecture hours; 20 hours of exercise sessions

NOTES:

-  The  bibliography includes useful references and is by no means exhaustive of the topics covered in the course. More readings will be mentioned during the lectures. The students are advised to attend the lectures, ask the professor for explanations and read the articles. For each of the books, only the part included in the lectures is mandatory.

- An optional project can be chosen by the students who want to improve their mark. The objective of the projects is to help students in applying the approaches and principles we teach in class. The students can ask to be assigned projects at any time of the whole academic year. Project artifacts can be released at any time of the academic year. The evaluation of projects will be based on the produced artifacts and on a report.


Prerequisiti

Students are required to know the principles and methods of database design and technology, and the basic notions of the Entity-Relationship conceptual model and of the Relational Data Model along with its languages. The exams needed to acquire these notions are Data bases I and Data bases II (the latter may be attended in the same semester)


Modalità di valutazione

The assessment will be based on a written exam at the end of the course and on the (optional) projects developed. If the student does not require to produce a project, the mark will be assigned at the first possible registration date; if the student has requested to produce a project, the project will be presented to the professor and to the teaching assistant, and after that the total mark will be assigned at the earliest possible registration date.

The written exam consists of (i) An exercise/design part (open-book), consisting in the design of an integration system, or of a data warehouse; It will assign up to 22 points and will be considered sufficient when the score will be equal or higher than 13; and of (ii) A theoretical part (closed-book), in which the students will be asked to explain, in terms most appropriate to an audience assigned in the specific case, two of the subjects illustrated during the lectures; this part will assign up to 10 points, and will be considered sufficient when the score will be equal or higher than 5. Both the quality of content and that of communication will be part of the evaluation

The scores of the two written parts will be summed up to compute the total score. The project part will assign from 0 to 2 points, to be added to the total mark. Students can take the written part at any exam session during the year. 30 cum laude will be assigned when the total score is strictly higher than 31.

The following table provides a detailed overview of the elements that will be considered in the various assessment activities.

Type of assessment

Description

Dublin descriptor TO DO

Written test

Exercises focusing on design aspects

·       Design of a data integration system based on the requirements provided

or

·       Design of a data warehouse based on the requirements provided

 

·       Written explanation of one of the course topics with special attention to the communication with a specific (assigned) kind of audience

1,2,3

 

 

 

 

1, 2, 3, 4, 5

Assessment of (optional) project

  • Assessment of the design and experimental work developed by students either individually or in groups

1, 2, 3, 4, 5


Bibliografia
Risorsa bibliografica obbligatoriaAnHai Doan, Alon Halevy, and Zachary Ives, Principles of Data Integration, Editore: Morgan Kaufmann, 1st edition, Anno edizione: 2012
Risorsa bibliografica obbligatoriaM. Golfarelli, S. Rizzi, Data Warehouse Design: Modern Principles and Methodologies, Editore: McGraw Hill, Anno edizione: 2009, ISBN: 0071610391
Risorsa bibliografica facoltativaXin Luna Dong, Divesh Srivastava, Big Data Integration. Synthesis Lectures on Data Management, Editore: Morgan & Claypool Publishers, Anno edizione: 2015, ISBN: 978-1-62705-223-8
Risorsa bibliografica facoltativaM. Lenzerini, Data Integration: A Theoretical Perspective, Proceedings of ACM PODS, pp. 233-246, Editore: ACM, Anno edizione: 2002, ISBN: 1-58113-507-6
Note:

This is a scientific paper published in a volume of Conference proceedings

Risorsa bibliografica facoltativaClement T. Yu, Weiyi Meng, Principles of Database Query Processing for Advanced Applications , Editore: Morgan Kaufmann, Anno edizione: 1998, ISBN: 1558604340
Note:

(in The Morgan Kaufmann Series in Data Management Systems)

Risorsa bibliografica facoltativaRoberto De Virgilio, Fausto Giunchiglia, Letizia Tanca (Eds.), Semantic Web Information Management - A Model-Based Perspective, Editore: Springer Verlag, Anno edizione: 2009, ISBN: 978-3-642-04328-4
Risorsa bibliografica facoltativaPang-Ning Tan, Michael Steinbach, Vipin Kumar, Introduction to Data Mining , Editore: Addison-Wesley, Anno edizione: 2006, ISBN: 0321321367 http://www-users.cs.umn.edu/~kumar/dmbook/index.php
Note:

The web site contains a lot of interesting material

Risorsa bibliografica facoltativaF. Colace, M. De Santo, V. Moscato, A. Picariello, F. A. Schreiber, L. Tanca (Eds), Data Management in Pervasive Systems, Editore: Springer Cham Heidelberg New York, Anno edizione: 2015, ISBN: 978-3-319-20061-3
Risorsa bibliografica facoltativaSchreiber F.A., Is Time a Real Time? An Overview of Time Ontology in Informatics , Editore: Springer, Anno edizione: 1994
Note:

in W.A. Halang, A.D. Stoyenko (Eds.) - Real Time Computing - , Springer Verlag NATO-ASI Vol. F127, pp. 283-307

Risorsa bibliografica facoltativaSnodgrass R., Ahn I., Temporal Databases - IEEE Computer, Editore: IEEE, Anno edizione: 1986, Fascicolo: vol. 19, n. 9, pp. 35-42

Forme didattiche
Tipo Forma Didattica Ore di attività svolte in aula
(hh:mm)
Ore di studio autonome
(hh:mm)
Lezione
30:00
45:00
Esercitazione
20:00
30:00
Laboratorio Informatico
0:00
0:00
Laboratorio Sperimentale
0:00
0:00
Laboratorio Di Progetto
0:00
0:00
Totale 50:00 75:00

Informazioni in lingua inglese a supporto dell'internazionalizzazione
Insegnamento erogato in lingua Inglese
Disponibilità di materiale didattico/slides in lingua inglese
Disponibilità di libri di testo/bibliografia in lingua inglese
Possibilità di sostenere l'esame in lingua inglese
Disponibilità di supporto didattico in lingua inglese
schedaincarico v. 1.6.5 / 1.6.5
Area Servizi ICT
05/12/2020