Multimodal Learning and Reasoning

Desmond Elliott, Douwe Kiela, Angeliki Lazaridou


Abstract
Natural Language Processing has broadened in scope to tackle more and more challenging language understanding and reasoning tasks. The core NLP tasks remain predominantly unimodal, focusing on linguistic input, despite the fact that we, humans, acquire and use language while communicating in perceptually rich environments. Moving towards human-level AI will require the integration and modeling of multiple modalities beyond language. With this tutorial, our aim is to introduce researchers to the areas of NLP that have dealt with multimodal signals. The key advantage of using multimodal signals in NLP tasks is the complementarity of the data in different modalities. For example, we are less likely to nd descriptions of yellow bananas or wooden chairs in text corpora, but these visual attributes can be readily extracted directly from images. Multimodal signals, such as visual, auditory or olfactory data, have proven useful for models of word similarity and relatedness, automatic image and video description, and even predicting the associated smells of words. Finally, multimodality offers a practical opportunity to study and apply multitask learning, a general machine learning paradigm that improves generalization performance of a task by using training signals of other related tasks.All material associated to the tutorial will be available at http://multimodalnlp.github.io/
Anthology ID:
P16-5001
Volume:
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts
Month:
August
Year:
2016
Address:
Berlin, Germany
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
Language:
URL:
https://www.aclweb.org/anthology/P16-5001
DOI:
Bib Export formats:
BibTeX MODS XML EndNote
Presentation:
 P16-5001.Presentation.pdf