logo-polimi
Loading...
Risorse bibliografiche
Risorsa bibliografica obbligatoria
Risorsa bibliografica facoltativa
Scheda Riassuntiva
Anno Accademico 2019/2020
Scuola Scuola di Ingegneria Industriale e dell'Informazione
Insegnamento 054306 - UNSTRUCTURED AND STREAMING DATA ENGINEERING
Docente Della Valle Emanuele
Cfu 5.00 Tipo insegnamento Monodisciplinare

Corso di Studi Codice Piano di Studio preventivamente approvato Da (compreso) A (escluso) Insegnamento
Ing Ind - Inf (Mag.)(ord. 270) - MI (481) COMPUTER SCIENCE AND ENGINEERING - INGEGNERIA INFORMATICA*AZZZZ054306 - UNSTRUCTURED AND STREAMING DATA ENGINEERING

Obiettivi dell'insegnamento

The course provides the foundational concepts and methods for designing, storing, analyzing and managing semi-structured and unstructured data, both in batch and in streaming. The course aims to tame the variety (data in many forms) and velocity (analyzing data streams to enable real-time decisions) dimensions of Big Data, without forgetting the volume dimension.


Risultati di apprendimento attesi

Dublin Descriptors

Expected learning outcomes

Knowledge and understanding

Students will learn how to:

  • Identify problems that can be addressed with big data storage and processing techniques tailored for variety and velocity
  • apply the basic nosql and stream processing technologies for real-world problems

Applying knowledge and understanding

Given specific project cases, students will be able to:

  • Define and implement a big data based solution for the problem
  • Apply it on real datasets

Making judgements

Given specific project cases, students will be able to:

  • Learn how to decide which data storage and processing solution to apply and how to evaluate this decision

Communication

Students will learn to:

  • Write a report on a project describing and motivating the decisions taken and the results obtained
  • Present their work in front of their colleagues and teachers

Lifelong learning skills

  • Students will learn how to develop a realistic unstructured and streaming data engineering project in all its phases

 

 


Argomenti trattati

The variety-oriented part of the course will focus on NoSQL (and not-only-SQL) models and technologies. Students will learn how to select appropriate data management solutions to deal with scalability, availability, consistency, performance and expressiveness requirements.
The course will cover high-level Big Data problems and dimensions, No-SQL data models and technologies (graph, column, document, key-value based storage; persistent and volatile solutions) and design techniques for NoSQL, the transition from ACID to BASE transactional properties, the specification of CRUD primitives (create, read, update, delete) implemented at scale, and the sharding and replication strategies.
The velocity-oriented part of the course will focus on time series, data streams and events both from a deductive and an inductive perspective. The deductive one focuses on domain-specific languages and knowledge representation techniques. Its main goal is to guide the students in exploring the trade-off between usability and rich formal semantics of query languages. The inductive one examines machine-learning problems focusing on massive online learning and, in particular, on the ability to learn when to forget past information.
Finally, the course will cover the basic aspects of the data analysis pipeline: acquisition, integration, exploration, mining, analytics, visualization, and interpretation. 


Prerequisiti

Students are expected to know the basics about: database management, SQL and distributed system architectures.


Modalità di valutazione

The exam consist in a practical part (project work) and a theoretical part (written exam with possible oral discussion). 

The practical part consist in solving a realistic big data engineering problem, based on real or realistic dataset publicly available , accessible via Web API, or provided by the teachers.

The written exam is composed of a mix of theoretical questions regarding any of the course subjects, and excercises, regarding the technical content and how to apply it in practice.

The optional oral examination consists of a discussion about the written test and the practical part of the exam. It can include also questions on any subject of the course.

Type of assessment

Description

Dublin descriptor

Written test

  • Theoretical questions
  • Exercises focusing on big data storage and processing aspects

1,4

1, 2, 3

Assessment of project artefacts

  • Assessment of the design and experimental work developed by students in groups

2, 3, 5

Oral presentation

  • Assessment of the presentation of the work developed by students in groups

2, 3, 4, 5

 


Bibliografia
Risorsa bibliografica obbligatoriaMarting Kleppmann, Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems, Editore: O'Reilly, Anno edizione: 2017, ISBN: 978-1449373320

Forme didattiche
Tipo Forma Didattica Ore di attività svolte in aula
(hh:mm)
Ore di studio autonome
(hh:mm)
Lezione
30:00
45:00
Esercitazione
15:00
22:30
Laboratorio Informatico
0:00
0:00
Laboratorio Sperimentale
0:00
0:00
Laboratorio Di Progetto
5:00
7:30
Totale 50:00 75:00

Informazioni in lingua inglese a supporto dell'internazionalizzazione
Insegnamento erogato in lingua Inglese
Disponibilità di materiale didattico/slides in lingua inglese
Disponibilità di libri di testo/bibliografia in lingua inglese
Possibilità di sostenere l'esame in lingua inglese
Disponibilità di supporto didattico in lingua inglese
schedaincarico v. 1.6.1 / 1.6.1
Area Servizi ICT
08/12/2019