logo-polimi
Loading...
Risorse bibliografiche
Risorsa bibliografica obbligatoria
Risorsa bibliografica facoltativa
Scheda Riassuntiva
Anno Accademico 2018/2019
Scuola Scuola di Ingegneria Industriale e dell'Informazione
Insegnamento 088946 - NATURAL LANGUAGE PROCESSING
Docente Sbattella Licia
Cfu 5.00 Tipo insegnamento Monodisciplinare

Corso di Studi Codice Piano di Studio preventivamente approvato Da (compreso) A (escluso) Insegnamento
Ing Ind - Inf (Mag.)(ord. 270) - MI (471) BIOMEDICAL ENGINEERING - INGEGNERIA BIOMEDICA*AZZZZ088946 - NATURAL LANGUAGE PROCESSING
Ing Ind - Inf (Mag.)(ord. 270) - MI (481) COMPUTER SCIENCE AND ENGINEERING - INGEGNERIA INFORMATICA*AZZZZ088946 - NATURAL LANGUAGE PROCESSING

Obiettivi dell'insegnamento

Course content and goals
Computational processing of written and spoken natural language (Natural Language Processing - NLP) refers to the analysis, interpretation, and production of natural language clauses. NLP is a growing, interdisciplinary research field, highly interesting from both the theoretical and the practical point of view.
Technical innovation is shifting both speculative and applicative attention towards heterogeneous language forms, such as verbal, vocal, iconic, and gestural.
After decades of theoretical and applied research a vast collection of symbolic and stochastic models exist. Such models enable the development of applications in several fields: human-machine interaction;  document analysis, search, and authoring (even in distributed settings); multimodal and multilingual linguistic authoring; machine translation; etc. 
Problems, models, and methodologies faced by NLP are also quite interesting for the study of communication, expression, and interaction processes among human beings, conversation and dialogue analyses;  sentiment analysis and langugae rehabilitation  as such topics are common to several disciplines (for example, think to communication, instruction,  cognitive sciences, psychology, or medicine).
History of NLP -full of huge efforts, intense discussions, and many failures- shows the complexity of the topic. Symbolic models, initially prevalent in the area, turned out to be unable to capture the intrinsic complexity of natural language. Today, such models are often augmented by means of stochastic models, especially about morphology, lexicon, syntax, semantics, pragmatics, and prosody. Moreover, stochastic models are also useful for management, search and retrieval of knowledge, whenever it is expressed by means of natural language (as verbal, iconic or gestural representations).
Current research directions in the NLP field, as in modern linguistics, tend to emphasise relationships between production and interpretation of written and  spoken communication.

Objectives
Introduce students to problems and suitable solution methodologies (with their strengths, and weaknesses) related to the analysis and production of natural language clauses, both written and spoken.
Present current role of stochastic models and Deep Learning; enlighten new opportunities of combining traditional, formal analysis based models with stochastic models, for morphology, syntax, semantics, pragmatics, voice, prosody, discourse and dialog analysis, sentiment analysis.
Provide hands-on, tutored practice sessions, where students can test models and techniques presented during classes. Applications will include: analysis of language based, human-machine and human-human interaction (for written, spoken, iconic, and gestural languages); linguistic and prosodic production and rehabilitation; pattern search and recognition for sentiment analysis in critical interaction; complexity analysis for both texts and generic communication events; definition of user profiles including preferences about expression modalities (in forensic, educative and clinical context). 


Risultati di apprendimento attesi

DdD 1 (Knowledge and understanding)
- knowing and understanding words, n-grams, word prediction and correction
- knowing and understanding syntactic analysis (POS tagging, chunking, parsing, formal grammars and probabilistic extensions)
- knowing and understanding semantics (meaning representation, word disambiguation, lexical semantics)
- knowing and understanding pragmatics (dialogue and conversational agents)

- knowing and understanding Sentiment analysis
- knowing and understanding speech analysis, recognition, and synthesis
- knowing and understanding Deep Learning, Neural Networks-based techniques for word representation, language modeling, tagging, parsing, etc.
DdD2 (Applying knowledge and understanding):
- applying knowledge and models learned during the course, implementing simple NLP tools, by means of the NLTK toolkit
DdD3 (Making judgements)
- being able to choose the right models/techniques and the right set of (acoustic/textual) features for solving specific NLP tasks
DdD4 (Communication skills)
- being able to work in a group
DdD5 (Learning skills)
- being able to learn new NLP techniques/models/frameworks, and to apply the learned techniques to new NLP tasks


Argomenti trattati

LECTURES

Introduction.

Mind models and linguistic / expressive / interactive competencies:

  • Development of expressive competencies, by means of verbal (both written and spoken), iconic, and gestural languages.
  • Linguistic competencies and the act of thinking.
  • Language, pragmatics, and interaction.

Natural language representation: levels and their complexity: computational linguistics as a representation of human linguistic competencies, as a model, and as a solution to specific and well defined problems.

Roles of symbolic and stochastic models in: morphologic, syntactic, semantic, and pragmatic analysis; sentiment analysis; spoken language, phonologic, and prosodic analysis; linguistic prediction; complexity evaluation; pattern recognition.

Trends in research and development: model composition and integration; definition of different criteria for model selection and composition/integration, given a language representation and a problem to cope with.

Models and techniques for written natural language processing.

Morphologic analysis and ambiguity resolution: lexicons, corpora and dictionaries.

Syntactic and structural analysis:

  • Symbolic approaches
  • Stochastic approaches
  • Deep Learing approaches
  • Hybrid approaches

Semantic and discourse analysis: using integrated approaches; analysis of different representation levels.

Models and techniques for spoken natural language processing.

Components and characteristics of vocal expression and interaction: feature extraction, classification of vocal characteristics, voice profile definition, vocal expression and interaction model.

Models for the description of: tone and prosody, time scheduling, forms, interactions, and complex dialogues, expressivity.

High quality text-to-speech (TTS) and speech recognition (ASR). Analysis strategies and models for emotional and affective components in both TTS and ASR.

Models and tools supporting an integrated analysis of verbal expressions, and supporting the enhancement of linguistic competencies in contexts of communication, forensic, educative and clinical relationship, and artistic performance.

Human-machine and human-human interaction.

Analysis and elaboration of linguistic-expressive resources on the net.

Supporting the analysis of communication and dialogue.

Supporting text authoring with prediction and summarization.

Supporting text complexity analysis.

Supporting speech and prosodic analysis.

Supporting sentiment analysis in critical interaction.

NLP for language rehabilitation.

Defining linguistic user profiles for verbal (both written and spoken) languages.

PRACTICES

Hands-on sessions about applications and tools.


Prerequisiti

Useful, but not required, machine learning and formal languages. Basic knowledge of the Python language is usefull but, once again, not required.


Modalità di valutazione

The students will be asked:
1) Written test (DdD: 1, 4, 5): Theoretical questions on all course topics, with open answers. in particular: three topics, each one composed of three open questions. The main goal is to assess the studen's comprehension of models and techniques presented during the course 

2) Oral presentation (DdD: 2, 3, 4, 5):
Discuss an essay about the project realized during the hands-on sessions.


Bibliografia
Risorsa bibliografica obbligatoriaDaniel Jurafsky & James H. Martin, Speech and Language Processing, Editore: Prentice Hall, Anno edizione: 2008
Note:

II Edition


Forme didattiche
Tipo Forma Didattica Ore di attività svolte in aula
(hh:mm)
Ore di studio autonome
(hh:mm)
Lezione
30:00
45:00
Esercitazione
20:00
30:00
Laboratorio Informatico
0:00
0:00
Laboratorio Sperimentale
0:00
0:00
Laboratorio Di Progetto
0:00
0:00
Totale 50:00 75:00

Informazioni in lingua inglese a supporto dell'internazionalizzazione
Insegnamento erogato in lingua Inglese
Disponibilità di materiale didattico/slides in lingua inglese
Disponibilità di libri di testo/bibliografia in lingua inglese
Possibilità di sostenere l'esame in lingua inglese
Disponibilità di supporto didattico in lingua inglese
schedaincarico v. 1.6.1 / 1.6.1
Area Servizi ICT
21/01/2020