Risorse bibliografiche
Risorsa bibliografica obbligatoria
Risorsa bibliografica facoltativa
Scheda Riassuntiva
Anno Accademico 2019/2020
Scuola Scuola di Ingegneria Industriale e dell'Informazione
Insegnamento 054318 - AUDIO AND VIDEO SIGNALS
Docente Bestagini Paolo , Marcon Marco
Cfu 10.00 Tipo insegnamento Corso Integrato
Didattica innovativa L'insegnamento prevede  1.0  CFU erogati con Didattica Innovativa come segue:
  • Blended Learning & Flipped Classroom

Corso di Studi Codice Piano di Studio preventivamente approvato Da (compreso) A (escluso) Insegnamento
Ing Ind - Inf (Mag.)(ord. 270) - MI (263) MUSIC AND ACOUSTIC ENGINEERING*AZZZZ091042 - VIDEO SIGNALS

Obiettivi dell'insegnamento

This course is split into two modules: the video and the audio one.

Visual information plays an important role in almost all areas of our life. Today, much of this information is represented and processed digitally. Digital image processing is ubiquitous, with applications ranging from television to tomography, from photography to printing, from robotics to remote sensing. "Video Signals" is a graduate-level introductory course to the fundamentals of digital image processing. The goal is to provide the students with the knowledge to handle algorithms for visual information processing and to develop novel approaches for specific applications.

In addition to visual information, also audio signals are part of everybody's life. The goal of the course is to provide students with advanced audio signal processing skills and knowledge. In this course we first present the fundamental tools for analyzing, synthesizing and processing sounds (voice, music, acoustic signals). We then show how to use such tools for developing a wide range of applications, ranging from music information retrieval to acoustic array processing. Topics are treated both theoretically and practically.

Risultati di apprendimento attesi

Dublin Descriptors

Expected learning outcomes

Knowledge and understanding

Students will learn how to:

·        Deal with image and video signals.

·        Model the human visual system and how it is replicated in the cameras.

·        Analyze visual information in different domains (frequency analysis, morphological descriptors) and specific filters.

·        Deal with colors and hyperspectral images.

·        Extract features and descriptors from images in order to describe effectively their content for classification and recognition purposes.

·       Study the interaction between audio signals and the human perception system.

·       Process a digital audio signal properly taking care of resolution limitations in terms of time and frequency.

·       Make use of audio signals for spatial analysis applications.

·       Extract features and descriptors from audio signals useful for acoustic characterization as well as for information retrieval scenarios.

Applying knowledge and understanding

Students will be able to:

·       Implement advanced image processing algorithms (from simple filters implementation to advanced descriptors extraction).

·       Implement advanced audio processing algorithms (e.g., denoising, dereverberation, audio effects, etc.)

Making judgements

Given a relatively complex problem, students will be able to:

·       Analyze and understand the goals, assumptions and requirements associated with the problem

·       Define the algorithmic procedure to solve the problem (e.g., choice of a suitable solution, parameters estimation and tuning, etc.)


Students will learn to:

·       Describe image processing algorithms highlighting strengths and weaknesses of different approaches.

·       Write a detailed description of audio analysis systems

·       Present their work abstracting from the selected implementation

Argomenti trattati

"Audio Signals" Module

  • Fundamentals
    • physiology of the hearing system and psycho-perception of sound
    • elements of acoustics
    • basic audio analysis: time-frequency analysis with Short-Time Fourier Transform or filterbanks
    • sound processing tools: filtering, nonlinear processing, auralization filters
    • adaptive filtering: Wiener-Hopf filter, steepest descent, LMS
    • space-time (array) processing: microphone arrays, beamforming
    • feature extraction and analysis
    • overview of sound synthesis tools: modulated delay lines, tunable delay lines, digital waveguides
  • Applications
    • harmonic analysis: pitch tracking, vocoder, envelope tracking
    • sound modification: time warping (resampling), time and pitch scaling (tonal and rhythmic corrector)
    • modulated delay lines and sound effects for musical applications: flanger, chorus, distortions, etc.
    • sound reverberation: perceptual methods, physics-inspired methods, geometric methods (acoustic ray and beam tracing)
    • music information retrieval: feature-based analysis and classification of musical excerpts, playlist generation, mood extraction, etc.
    • adaptive sound processing and applications: echo cancellation, noise reduction, dereverberation, etc.
    • array processing: beamforming, acoustic source localization and extraction (demixing), acoustic room compensation 

"Video Signals" Module

  • image sampling and quantization,
  • pinhole and real cameras models,
  • colors and colorimetry,
  • spatial filtering and local descriptors,
  • Object detection, Hough/Radon transforms,
  • image segmentation,
  • morphological image processing,
  • image Spectral analysis based on Fourier transform,
  • noise reduction and restoration,
  • deconvolution and blind deconvolution,
  • Image and Video Compression,
  • motion analysis and tracking,
  • multiple image stitching for panoramic images and videos,
  • High Dynamic Range Images.

Laboratory activities enable student to improve their understanding of the concepts learnt during the lectures. The proposed examples of applications and exercises are based on Matlab® and the related toolboxes (Digital Signal Processing System Toolbox, Image acquisition toolbox, Image Processing Toolbox ).

Innovative Didactics

The "Flipped Classroom" methodology will be applied to practical aspects concerning better understanding of Matlab techniques for a proper signal processing.


Students are required to know the basic principles of digital signal processing.

Modalità di valutazione

"Audio Signals" Module

The assessment will be based on a written exam at the end of the course. The written exam consists of numerical exercises and theoretical questions. This will assign up to 31 points. Students achieving a score greater than or equal to 28 can develop an optional project. The project must be planned with the teacher, and it can provide up to 4 additional points. 30 cum laude will be assigned when the total score is higher than 31.

Type of assessment


Dublin descriptors

Written test

Solution of numerical problems

·       Time and frequency audio resolution

·       Wiener filtering estimation

·       Microphone arrays for direction of arrival estimation


Exercises focusing on design aspects

·       Optiaml design of windowing solution for audio signal processing

·       Design of acoustic restoration systems


Theoretical questions on all course topics with open answer

·       Elements of acoustics and psychoacoustics

·       Sound analysis and synthesis tools

·       Audio effects and interpolated delay lines

·       Acoustic restoration and reverberation

·       Feature extraction and classification

·       Microphone arrays

1, 2






1, 2, 3, 4




1, 4







"Video Signals" Module

The exam for the “video signals” module is a written test with 3 or 4 exercises to be solved in 2 hours. Two or three exercises require a numerical/procedural solution while the last one is a exercise requiring the definition of a Matlab procedure in order to solve the question and will focus on exercises exposed during laboratories. The Matlab code has to be written directly on the paper without any computer aid.

The exam will take place only in the written version; no oral exams, integrations or projects will be considered in order to increase the obtained grade.

Type of assessment


Dublin descriptor

Written test
Solution of numerical problems concerning:

·        Image correction

         o   Brightness/contrast enhancement

         o   Distortions removal

         o   Colors correction

         o   Change of perspective

         o   Deblurring/denoising

·        Image filter

·        Edge/corner extraction

·        Application of morphological operators

·        Denoising/Wiener Filtering

·        Application of segmentation criteria

·        High Dynamic Range Images fusion

Exercises focusing on design aspects

·        Image correction (opeartions on histogram, colors, distortions…)

·        Processing based on frequency analysis

·        Segmentation

·        Implementation of ad-hoc Morphological operators

·        Features extraction and comparison.

·        Hough and Radon transform

·        Perspective correction

·        Deconvolution and blind deconvolution


Theoretical questions on all course topics with open answer

·        image sampling and quantization,

·        pinhole and real cameras models,

·        colors and colorimetry,

·        spatial filtering and local descriptors,

·        Object detection, Hough/Radon transforms,

·        image segmentation,

·        morphological image processing,

·        image Spectral analysis based on Fourier transform,

·        noise reduction and restoration,

·        deconvolution and blind deconvolution,

·        Image and Video Compression,

·        motion analysis and tracking,

·        multiple image stitching for panoramic images and videos,

·        High Dynamic Range Images.















1, 2, 3











1, 4















Concerning the assessment of the whole exam (Audio and Video Signals)

In order to pass the whole “Audio and Video Signals” exam (first and second module), a positive grade greater than or equal to 18 must be obtained in both modules and the final mark will be the average of the two marks rounded up to the next integer.

In evaluating the average between the two modules, a 30 cum laude mark in a module will be considered as 30; in order to get 30 cum laude as a final mark, the mark of at least one module must be 30 cum laude.

In every exam date, the two modules will take place the same day one after the other (more details will be provided through web poliself); however students can choose to take on a single module or both modules in the same day.

Once a student gets a positive mark in a module this mark will be automatically frozen until she/he takes again the same module in a following session; “taking again the exam on the same module” means that the student registers, participates and turns in his/her solution; if the student simply participates to an exam but does not turn in his/her solution, this will not change the previous mark and the student will be considered as "asbsentee".

Once a student gets a positive mark in both modules the final mark will be evaluated and then published for the publishing period on the web poliself; at the end of this period it will be automatically recorded. If the student does not want to record that final grade he/she has to refuse it from the poliself: in that case both marks will be restored in a "frozen" state in order to allow the student to take on again a module (or both of them). However if the student has a frozen mark in both modules at the end of each exam to which his/her signed in, the final average mark will be automatically published and, after the publishing period, automatically recorded: so, if the student wants to improve the final mark he/she has to remember to refuse the published final grade.

Risorsa bibliografica obbligatoriaR.C. Gonzalez, R. E. Woods, Digital Image Processing, Editore: Addison-Wesley Pub., Anno edizione: 2008
Risorsa bibliografica obbligatoriaSergios Theodoridis, Konstantinos Koutroumbas, Pattern Recognition, Editore: Elsevier Academic Press, Anno edizione: 2003
Risorsa bibliografica facoltativaAlberto S. Aguado, Mark S. Nixon, Feature Extraction and Image Processing, Anno edizione: 2002
Risorsa bibliografica facoltativaIsaac Bankman, Handbook of Medical Imaging, Editore: Elsevier Academic Press, Anno edizione: 2000
Risorsa bibliografica facoltativaStephan Mallat, Wavelet Tour of Signal Processing, Editore: Academic Press, Anno edizione: 1999
Risorsa bibliografica facoltativaLecture notes and slides https://beep.metid.polimi.it/

Forme didattiche
Tipo Forma Didattica Ore di attività svolte in aula
Ore di studio autonome
Laboratorio Informatico
Laboratorio Sperimentale
Laboratorio Di Progetto
Totale 100:00 150:00

Informazioni in lingua inglese a supporto dell'internazionalizzazione
Insegnamento erogato in lingua Inglese
Disponibilità di materiale didattico/slides in lingua inglese
Disponibilità di libri di testo/bibliografia in lingua inglese
Possibilità di sostenere l'esame in lingua inglese
Disponibilità di supporto didattico in lingua inglese
schedaincarico v. 1.6.5 / 1.6.5
Area Servizi ICT