logo-polimi
Loading...
Risorse bibliografiche
Risorsa bibliografica obbligatoria
Risorsa bibliografica facoltativa
Scheda Riassuntiva
Anno Accademico 2021/2022
Scuola Scuola di Ingegneria Industriale e dell'Informazione
Insegnamento 056901 - SYSTEMS AND METHODS FOR BIG AND UNSTRUCTURED DATA
Docente Brambilla Marco
Cfu 5.00 Tipo insegnamento Monodisciplinare

Corso di Studi Codice Piano di Studio preventivamente approvato Da (compreso) A (escluso) Insegnamento
Ing Ind - Inf (Mag.)(ord. 270) - MI (474) TELECOMMUNICATION ENGINEERING - INGEGNERIA DELLE TELECOMUNICAZIONI*AZZZZ056901 - SYSTEMS AND METHODS FOR BIG AND UNSTRUCTURED DATA
Ing Ind - Inf (Mag.)(ord. 270) - MI (481) COMPUTER SCIENCE AND ENGINEERING - INGEGNERIA INFORMATICA*AZZZZ056901 - SYSTEMS AND METHODS FOR BIG AND UNSTRUCTURED DATA

Obiettivi dell'insegnamento

The objective of the course is to address the problems, solutions, methods, and technologies for big data storage and management, with special emphasis on scalability and persistency. The course is structured in three main parts. The first part covers the main principles and approaches of big data management, spanning issues like scalability, transactionality, and distribution of data. The second part covers the different approaches to unstructured data management, describing data models, query languages, and architectural solutions for non-relational data storage, also known as NoSQL solutions, spanning graph, columnar, documental, key-value, and IR-based databases. The third part discusses the design methodologies for the specification of NoSQL data models.


Risultati di apprendimento attesi

Students will learn the basic concepts of modern database approaches. They will become able to design, program, and use the different database methods, and they will learn how to select and adopt the best option depending on the business and technical requirements.

 

Dublin Descriptors

Expected learning outcomes

Knowledge and understanding

Students will learn how to:

  • Identify problems that can be addressed with big data storage and processing techniques 
  • apply the basic nosql technologies for real-world problems

Applying knowledge and understanding

Given specific project cases, students will be able to:

  • Define and implement a big data solution for the problem
  • Apply it on real datasets

Making judgements

Given specific project cases, students will be able to:

  • Learn how to decide which data storage and processing solution to apply and how to evaluate this decision

Communication

Students will learn to:

  • Write a report on a project describing and motivating the decisions taken and the results obtained
  • Present their work in front of their colleagues and teachers

Lifelong learning skills

  • Students will learn how to develop a realistic unstructured data engineering project in all its phases

 

 


Argomenti trattati

The course content is organized in three main chapters:

1. Approaches to Big Data management

  • Big Data problems and dimensions
  • Data engineering and data science pipeline
  • Enterprise-scale data management
  • Scalability and persistency vs. volatility issues
  • Cross-source data integration problems and architectures
  • CAP theorem and implications. Non-relational distribution architectures
  • Evolution of transactional properties: from ACID to BASE. Modern transactional architectures
  • Data sharding and replication

2. Systems and Models for Big and Unstructured Data

  • Graph databases
  • Semantic databases
  • Columnar databases
  • Document-oriented databases
  • Key-value databases
  • IR-based databases

Each category of systems is covered along 5 dimensions: (1) data model (2) (declarative vs. imperative) query languages; (3) data distribution (4) non-functional aspects (5) architectural solutions.

3. Methods for the Design of Applications

  • Modeling languages and methods for building unstructured data applications
  • Design methodology within the data engineering pipeline
  • Schema-less, implicit-schema, and schema-on-read approaches

Prerequisiti

Students are expected to have basic knowledge of programming approaches and abstractions, distributed system architectures, relational databases, database design, entity-relationship models, SQL query language, and database programming (for relational databases). Basic knowledge of python is advised.


Modalità di valutazione

The exam consists in a practical part (project work) and a theoretical part (written exam, with possible oral discussion if deeded necessary by the instructor). 

The practical part consists in solving a realistic big data engineering problem, based on real or realistic dataset publicly accessible via Web API, or provided by the instructors.

The written exam is composed of a mix of theoretical questions regarding any of the course subjects, and exercises regarding the technical content and how to apply it in practice (including programming aspects).

The optional oral examination consists of a discussion about the written test and the practical part of the exam. It can include also questions on any subject of the course.

Type of assessment

Description

Dublin descriptor

Written test

  • Theoretical questions
  • Exercises focusing on big data storage and processing aspects

1,4

1, 2, 3

Assessment of project artefacts

  • Assessment of the design and experimental work developed by students in groups

2, 3, 5

Oral presentation

  • Assessment of the presentation of the work developed by students in groups

2, 3, 4, 5

 


Bibliografia
Risorsa bibliografica facoltativaMarting Kleppmann, Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems, Editore: O'Reilly, Anno edizione: 2017, ISBN: 978-1449373320
Risorsa bibliografica obbligatoriaMarco Brambilla, Emanuele Della Valle, Andrea Tocchetti, et al., Course Notes, Anno edizione: 2021

Software utilizzato
Nessun software richiesto

Forme didattiche
Tipo Forma Didattica Ore di attività svolte in aula
(hh:mm)
Ore di studio autonome
(hh:mm)
Lezione
30:00
45:00
Esercitazione
15:00
22:30
Laboratorio Informatico
0:00
0:00
Laboratorio Sperimentale
0:00
0:00
Laboratorio Di Progetto
5:00
7:30
Totale 50:00 75:00

Informazioni in lingua inglese a supporto dell'internazionalizzazione
Insegnamento erogato in lingua Inglese
Disponibilità di materiale didattico/slides in lingua inglese
Disponibilità di libri di testo/bibliografia in lingua inglese
Possibilità di sostenere l'esame in lingua inglese
Disponibilità di supporto didattico in lingua inglese
schedaincarico v. 1.6.9 / 1.6.9
Area Servizi ICT
17/10/2021