Ing Ind - Inf (Mag.)(ord. 270) - MI (471) BIOMEDICAL ENGINEERING - INGEGNERIA BIOMEDICA
*
A
ZZZZ
088946 - NATURAL LANGUAGE PROCESSING
Ing Ind - Inf (Mag.)(ord. 270) - MI (481) COMPUTER SCIENCE AND ENGINEERING - INGEGNERIA INFORMATICA
*
A
ZZZZ
088946 - NATURAL LANGUAGE PROCESSING
Ing Ind - Inf (Mag.)(ord. 270) - MI (487) MATHEMATICAL ENGINEERING - INGEGNERIA MATEMATICA
*
A
ZZZZ
088946 - NATURAL LANGUAGE PROCESSING
Obiettivi dell'insegnamento
Natural Language Processing (NLP) concerns the computational analysis, interpretation, and production of natural language in either written or spoken form. It is an interdisciplinary research field, interesting from both theoretical and practical perspectives. Decades of research have resulted in a vast collection of symbolic, stochastic, and deep-learning based models. Such models have enable the development of applications in a vast array of fields, such as human-machine interaction and chatbots, search and question answering, translation and multilingual systems, multimodal and captioning systems, speech analysis, voice interaction and personal assistants, sentiment analysis, etc, etc.
This course will provide an introduction to the important problems, models and applications in NLP. The history of NLP involves many successes and many failures, demonstrating the complexity of the topic. Initially popular symbolic models turned out to be unable to capture the intrinsic complexity of natural language. Statistical techniques such as vector-space representations and linear classsifiers (e.g. Support Vector Machines) enabled important applications such as web search spam detection. Word embedding techniques then became popular and improved performance on all aspects of NLP: from morphology to semantics and dialogue. More recently sequence-to-sequence modeling with deep learning techniques have greatly improved performance on hard NLP problems such machine translation and dialog generation.
Risultati di apprendimento attesi
Introduce students to problems and solutions (with their respective strength and weaknesses) related to the analysis and production of natural language, both written and spoken: - Present machine learning and deep learning techniques: for morphology, syntax, semantics, pragmatics, voice, prosody, discourse and dialog analysis, sentiment analysis. - Provide hands-on practice sessions, where students can test models and techniques presented during classes, with applications such as sentiment analysis, named entity extraction, question answering, etc.
Argomenti trattati
Topics covered in this course will include the following.
Common techniques used in NLP, such as: - regular expressions, - vector space representations of text, - text classification with linear and non-linear classifiers, - text clustering and topic modelling techniques, - word embedding based representations of text (such as Word2Vec), - sequence-to-sequence models (including recurrent neural networks), - deep learning techniques (Transformer models like BERT and GPT-2).
Common tasks investigated in NLP, such as: - sentiment analysis, - named entity extraction, - translation, - summarisation, - question answering.
In addition, hands-on programming sessions will cover the practical aspects of building NLP applications.
Prerequisiti
Useful, but not required: - machine learning - basic knowledge of the Python programming language.
Modalità di valutazione
1) Group assignment: worth 40% of grade. - involves programming in Python to analyse a text dataset, build an application and present the results of the analysis during class. The aim is to demonstrate the use of techniques learnt during the hands-on practical sessions.
2) Written or oral exam: worth 60% of the grade. - involves theoretical questions on all course topics. The goal is to assess the student's comprehension of models and techniques presented during the course.
Bibliografia
Daniel Jurafsky & James H. Martin, Speech and Language Processing, Editore: Prentice Hall, Anno edizione: 2008 Note:
II edition (2008) -
III edition to appear in 2022 (?)
Software utilizzato
Nessun software richiesto
Forme didattiche
Tipo Forma Didattica
Ore di attività svolte in aula
(hh:mm)
Ore di studio autonome
(hh:mm)
Lezione
30:00
45:00
Esercitazione
20:00
30:00
Laboratorio Informatico
0:00
0:00
Laboratorio Sperimentale
0:00
0:00
Laboratorio Di Progetto
0:00
0:00
Totale
50:00
75:00
Informazioni in lingua inglese a supporto dell'internazionalizzazione
Insegnamento erogato in lingua
Inglese
Disponibilità di materiale didattico/slides in lingua inglese
Disponibilità di libri di testo/bibliografia in lingua inglese
Possibilità di sostenere l'esame in lingua inglese
Disponibilità di supporto didattico in lingua inglese