Amir Zadeh


2020

pdf bib
Second Grand-Challenge and Workshop on Multimodal Language (Challenge-HML)
Amir Zadeh | Louis-Philippe Morency | Paul Pu Liang | Soujanya Poria
Second Grand-Challenge and Workshop on Multimodal Language (Challenge-HML)

2018

pdf bib
Conversational Memory Network for Emotion Recognition in Dyadic Dialogue Videos
Devamanyu Hazarika | Soujanya Poria | Amir Zadeh | Erik Cambria | Louis-Philippe Morency | Roger Zimmermann
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

Emotion recognition in conversations is crucial for the development of empathetic machines. Present methods mostly ignore the role of inter-speaker dependency relations while classifying emotions in conversations. In this paper, we address recognizing utterance-level emotions in dyadic conversational videos. We propose a deep neural framework, termed Conversational Memory Network (CMN), which leverages contextual information from the conversation history. In particular, CMN uses multimodal approach comprising audio, visual and textual features with gated recurrent units to model past utterances of each speaker into memories. These memories are then merged using attention-based hops to capture inter-speaker dependencies. Experiments show a significant improvement of 3 − 4% in accuracy over the state of the art.

pdf bib
Proceedings of Grand Challenge and Workshop on Human Multimodal Language (Challenge-HML)
Amir Zadeh | Paul Pu Liang | Louis-Philippe Morency | Soujanya Poria | Erik Cambria | Stefan Scherer
Proceedings of Grand Challenge and Workshop on Human Multimodal Language (Challenge-HML)

2017

pdf bib
Context-Dependent Sentiment Analysis in User-Generated Videos
Soujanya Poria | Erik Cambria | Devamanyu Hazarika | Navonil Majumder | Amir Zadeh | Louis-Philippe Morency
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Multimodal sentiment analysis is a developing area of research, which involves the identification of sentiments in videos. Current research considers utterances as independent entities, i.e., ignores the interdependencies and relations among the utterances of a video. In this paper, we propose a LSTM-based model that enables utterances to capture contextual information from their surroundings in the same video, thus aiding the classification process. Our method shows 5-10% performance improvement over the state of the art and high robustness to generalizability.

pdf bib
Combating Human Trafficking with Multimodal Deep Models
Edmund Tong | Amir Zadeh | Cara Jones | Louis-Philippe Morency
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Human trafficking is a global epidemic affecting millions of people across the planet. Sex trafficking, the dominant form of human trafficking, has seen a significant rise mostly due to the abundance of escort websites, where human traffickers can openly advertise among at-will escort advertisements. In this paper, we take a major step in the automatic detection of advertisements suspected to pertain to human trafficking. We present a novel dataset called Trafficking-10k, with more than 10,000 advertisements annotated for this task. The dataset contains two sources of information per advertisement: text and images. For the accurate detection of trafficking advertisements, we designed and trained a deep multimodal model called the Human Trafficking Deep Network (HTDN).

pdf bib
Tensor Fusion Network for Multimodal Sentiment Analysis
Amir Zadeh | Minghai Chen | Soujanya Poria | Erik Cambria | Louis-Philippe Morency
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Multimodal sentiment analysis is an increasingly popular research area, which extends the conventional language-based definition of sentiment analysis to a multimodal setup where other relevant modalities accompany language. In this paper, we pose the problem of multimodal sentiment analysis as modeling intra-modality and inter-modality dynamics. We introduce a novel model, termed Tensor Fusion Networks, which learns both such dynamics end-to-end. The proposed approach is tailored for the volatile nature of spoken language in online videos as well as accompanying gestures and voice. In the experiments, our model outperforms state-of-the-art approaches for both multimodal and unimodal sentiment analysis.