Ing Ind - Inf (Mag.)(ord. 270) - MI (471) BIOMEDICAL ENGINEERING - INGEGNERIA BIOMEDICA
*
A
ZZZZ
088946 - NATURAL LANGUAGE PROCESSING
Ing Ind - Inf (Mag.)(ord. 270) - MI (481) COMPUTER SCIENCE AND ENGINEERING - INGEGNERIA INFORMATICA
*
A
ZZZZ
088946 - NATURAL LANGUAGE PROCESSING
Obiettivi dell'insegnamento
Course content and goals Computational processing of written and spoken natural language (Natural Language Processing - NLP) refers to the analysis, interpretation, and production of natural language clauses. NLP is a growing, interdisciplinary research field, highly interesting from both the theoretical and the practical point of view. Technical innovation is shifting both speculative and applicative attention towards heterogeneous language forms, such as verbal, vocal, iconic, and gestural. After decades of theoretical and applied research a vast collection of symbolic and stochastic models exist. Such models enable the development of applications in several fields: human-machine interaction; document analysis, search, and authoring (even in distributed settings); multimodal and multilingual linguistic authoring; machine translation; etc. Problems, models, and methodologies faced by NLP are also quite interesting for the study of communication, expression, and interaction processes among human beings, conversation and dialogue analyses; sentiment analysis and langugae rehabilitation as such topics are common to several disciplines (for example, think to communication, instruction, cognitive sciences, psychology, or medicine). History of NLP -full of huge efforts, intense discussions, and many failures- shows the complexity of the topic. Symbolic models, initially prevalent in the area, turned out to be unable to capture the intrinsic complexity of natural language. Today, such models are often augmented by means of stochastic models, especially about morphology, lexicon, syntax, semantics, pragmatics, and prosody. Moreover, stochastic models are also useful for management, search and retrieval of knowledge, whenever it is expressed by means of natural language (as verbal, iconic or gestural representations). Current research directions in the NLP field, as in modern linguistics, tend to emphasise relationships between production and interpretation of written and spoken communication.
Objectives Introduce students to problems and suitable solution methodologies (with their strengths, and weaknesses) related to the analysis and production of natural language clauses, both written and spoken. Present current role of stochastic models and Deep Learning; enlighten new opportunities of combining traditional, formal analysis based models with stochastic models, for morphology, syntax, semantics, pragmatics, voice, prosody, discourse and dialog analysis, sentiment analysis. Provide hands-on, tutored practice sessions, where students can test models and techniques presented during classes. Applications will include: analysis of language based, human-machine and human-human interaction (for written, spoken, iconic, and gestural languages); linguistic and prosodic production and rehabilitation; pattern search and recognition for sentiment analysis in critical interaction; complexity analysis for both texts and generic communication events; definition of user profiles including preferences about expression modalities (in forensic, educative and clinical context).
Risultati di apprendimento attesi
DdD 1 (Knowledge and understanding) - knowing and understanding words, n-grams, word prediction and correction - knowing and understanding syntactic analysis (POS tagging, chunking, parsing, formal grammars and probabilistic extensions) - knowing and understanding semantics (meaning representation, word disambiguation, lexical semantics) - knowing and understanding pragmatics (dialogue and conversational agents)
- knowing and understanding Sentiment analysis - knowing and understanding speech analysis, recognition, and synthesis - knowing and understanding Deep Learning, Neural Networks-based techniques for word representation, language modeling, tagging, parsing, etc. DdD2 (Applying knowledge and understanding): - applying knowledge and models learned during the course, implementing simple NLP tools, by means of the NLTK toolkit DdD3 (Making judgements) - being able to choose the right models/techniques and the right set of (acoustic/textual) features for solving specific NLP tasks DdD4 (Communication skills) - being able to work in a group DdD5 (Learning skills) - being able to learn new NLP techniques/models/frameworks, and to apply the learned techniques to new NLP tasks
Argomenti trattati
LECTURES
Introduction.
Mind models and linguistic / expressive / interactive competencies:
Development of expressive competencies, by means of verbal (both written and spoken), iconic, and gestural languages.
Linguistic competencies and the act of thinking.
Language, pragmatics, and interaction.
Natural language representation: levels and their complexity: computational linguistics as a representation of human linguistic competencies, as a model, and as a solution to specific and well defined problems.
Roles of symbolic and stochastic models in: morphologic, syntactic, semantic, and pragmatic analysis; sentiment analysis; spoken language, phonologic, and prosodic analysis; linguistic prediction; complexity evaluation; pattern recognition.
Trends in research and development: model composition and integration; definition of different criteria for model selection and composition/integration, given a language representation and a problem to cope with.
Models and techniques for written natural language processing.
Morphologic analysis and ambiguity resolution: lexicons, corpora and dictionaries.
Syntactic and structural analysis:
Symbolic approaches
Stochastic approaches
Deep Learing approaches
Hybrid approaches
Semantic and discourse analysis: using integrated approaches; analysis of different representation levels.
Models and techniques for spoken natural language processing.
Components and characteristics of vocal expression and interaction: feature extraction, classification of vocal characteristics, voice profile definition, vocal expression and interaction model.
Models for the description of: tone and prosody, time scheduling, forms, interactions, and complex dialogues, expressivity.
High quality text-to-speech (TTS) and speech recognition (ASR). Analysis strategies and models for emotional and affective components in both TTS and ASR.
Models and tools supporting an integrated analysis of verbal expressions, and supporting the enhancement of linguistic competencies in contexts of communication, forensic, educative and clinical relationship, and artistic performance.
Human-machine and human-human interaction.
Analysis and elaboration of linguistic-expressive resources on the net.
Supporting the analysis of communication and dialogue.
Supporting text authoring with prediction and summarization.
Supporting text complexity analysis.
Supporting speech and prosodic analysis.
Supporting sentiment analysis in critical interaction.
NLP for language rehabilitation.
Defining linguistic user profiles for verbal (both written and spoken) languages.
PRACTICES
Hands-on sessions about applications and tools.
Prerequisiti
Useful, but not required, machine learning and formal languages. Basic knowledge of the Python language is usefull but, once again, not required.
Modalità di valutazione
The students will be asked: 1) Written test (DdD: 1, 4, 5): Theoretical questions on all course topics, with open answers. in particular: three topics, each one composed of three open questions. The main goal is to assess the studen's comprehension of models and techniques presented during the course
2) Oral presentation (DdD: 2, 3, 4, 5): Discuss an essay about the project realized during the hands-on sessions.
Bibliografia
Daniel Jurafsky & James H. Martin, Speech and Language Processing, Editore: Prentice Hall, Anno edizione: 2008 Note:
II Edition
Software utilizzato
Nessun software richiesto
Forme didattiche
Tipo Forma Didattica
Ore di attività svolte in aula
(hh:mm)
Ore di studio autonome
(hh:mm)
Lezione
30:00
45:00
Esercitazione
20:00
30:00
Laboratorio Informatico
0:00
0:00
Laboratorio Sperimentale
0:00
0:00
Laboratorio Di Progetto
0:00
0:00
Totale
50:00
75:00
Informazioni in lingua inglese a supporto dell'internazionalizzazione
Insegnamento erogato in lingua
Inglese
Disponibilità di materiale didattico/slides in lingua inglese
Disponibilità di libri di testo/bibliografia in lingua inglese
Possibilità di sostenere l'esame in lingua inglese
Disponibilità di supporto didattico in lingua inglese