Ahmed Ali


2020

pdf bib
What Was Written vs. Who Read It: News Media Profiling Using Text Analysis and Social Media Context
Ramy Baly | Georgi Karadzhov | Jisun An | Haewoon Kwak | Yoan Dinkov | Ahmed Ali | James Glass | Preslav Nakov
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Predicting the political bias and the factuality of reporting of entire news outlets are critical elements of media profiling, which is an understudied but an increasingly important research direction. The present level of proliferation of fake, biased, and propagandistic content online has made it impossible to fact-check every single suspicious claim, either manually or automatically. Thus, it has been proposed to profile entire news outlets and to look for those that are likely to publish fake or biased content. This makes it possible to detect likely “fake news” the moment they are published, by simply checking the reliability of their source. From a practical perspective, political bias and factuality of reporting have a linguistic aspect but also a social context. Here, we study the impact of both, namely (i) what was written (i.e., what was published by the target medium, and how it describes itself in Twitter) vs. (ii) who reads it (i.e., analyzing the target medium’s audience on social media). We further study (iii) what was written about the target medium (in Wikipedia). The evaluation results show that what was written matters most, and we further show that putting all information sources together yields huge improvements over the current state-of-the-art.

2019

pdf bib
Proceedings of the Sixth Workshop on NLP for Similar Languages, Varieties and Dialects
Marcos Zampieri | Preslav Nakov | Shervin Malmasi | Nikola Ljubešić | Jörg Tiedemann | Ahmed Ali
Proceedings of the Sixth Workshop on NLP for Similar Languages, Varieties and Dialects

2018

pdf bib
Word Error Rate Estimation for Speech Recognition: e-WER
Ahmed Ali | Steve Renals
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Measuring the performance of automatic speech recognition (ASR) systems requires manually transcribed data in order to compute the word error rate (WER), which is often time-consuming and expensive. In this paper, we propose a novel approach to estimate WER, or e-WER, which does not require a gold-standard transcription of the test set. Our e-WER framework uses a comprehensive set of features: ASR recognised text, character recognition results to complement recognition output, and internal decoder features. We report results for the two features; black-box and glass-box using unseen 24 Arabic broadcast programs. Our system achieves 16.9% WER root mean squared error (RMSE) across 1,400 sentences. The estimated overall WER e-WER was 25.3% for the three hours test set, while the actual WER was 28.5%.

pdf bib
Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2018)
Marcos Zampieri | Preslav Nakov | Nikola Ljubešić | Jörg Tiedemann | Shervin Malmasi | Ahmed Ali
Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2018)

pdf bib
Language Identification and Morphosyntactic Tagging: The Second VarDial Evaluation Campaign
Marcos Zampieri | Shervin Malmasi | Preslav Nakov | Ahmed Ali | Suwon Shon | James Glass | Yves Scherrer | Tanja Samardžić | Nikola Ljubešić | Jörg Tiedemann | Chris van der Lee | Stefan Grondelaers | Nelleke Oostdijk | Dirk Speelman | Antal van den Bosch | Ritesh Kumar | Bornini Lahiri | Mayank Jain
Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2018)

We present the results and the findings of the Second VarDial Evaluation Campaign on Natural Language Processing (NLP) for Similar Languages, Varieties and Dialects. The campaign was organized as part of the fifth edition of the VarDial workshop, collocated with COLING’2018. This year, the campaign included five shared tasks, including two task re-runs – Arabic Dialect Identification (ADI) and German Dialect Identification (GDI) –, and three new tasks – Morphosyntactic Tagging of Tweets (MTT), Discriminating between Dutch and Flemish in Subtitles (DFS), and Indo-Aryan Language Identification (ILI). A total of 24 teams submitted runs across the five shared tasks, and contributed 22 system description papers, which were included in the VarDial workshop proceedings and are referred to in this report.

2017

pdf bib
QCRI Live Speech Translation System
Fahim Dalvi | Yifan Zhang | Sameer Khurana | Nadir Durrani | Hassan Sajjad | Ahmed Abdelali | Hamdy Mubarak | Ahmed Ali | Stephan Vogel
Proceedings of the Software Demonstrations of the 15th Conference of the European Chapter of the Association for Computational Linguistics

This paper presents QCRI’s Arabic-to-English live speech translation system. It features modern web technologies to capture live audio, and broadcasts Arabic transcriptions and English translations simultaneously. Our Kaldi-based ASR system uses the Time Delay Neural Network (TDNN) architecture, while our Machine Translation (MT) system uses both phrase-based and neural frameworks. Although our neural MT system is slower than the phrase-based system, it produces significantly better translations and is memory efficient. The demo is available at https://st.qcri.org/demos/livetranslation.

pdf bib
The SUMMA Platform Prototype
Renars Liepins | Ulrich Germann | Guntis Barzdins | Alexandra Birch | Steve Renals | Susanne Weber | Peggy van der Kreeft | Hervé Bourlard | João Prieto | Ondřej Klejch | Peter Bell | Alexandros Lazaridis | Alfonso Mendes | Sebastian Riedel | Mariana S. C. Almeida | Pedro Balage | Shay B. Cohen | Tomasz Dwojak | Philip N. Garner | Andreas Giefer | Marcin Junczys-Dowmunt | Hina Imran | David Nogueira | Ahmed Ali | Sebastião Miranda | Andrei Popescu-Belis | Lesly Miculicich Werlen | Nikos Papasarantopoulos | Abiola Obamuyide | Clive Jones | Fahim Dalvi | Andreas Vlachos | Yang Wang | Sibo Tong | Rico Sennrich | Nikolaos Pappas | Shashi Narayan | Marco Damonte | Nadir Durrani | Sameer Khurana | Ahmed Abdelali | Hassan Sajjad | Stephan Vogel | David Sheppey | Chris Hernon | Jeff Mitchell
Proceedings of the Software Demonstrations of the 15th Conference of the European Chapter of the Association for Computational Linguistics

We present the first prototype of the SUMMA Platform: an integrated platform for multilingual media monitoring. The platform contains a rich suite of low-level and high-level natural language processing technologies: automatic speech recognition of broadcast media, machine translation, automated tagging and classification of named entities, semantic parsing to detect relationships between entities, and automatic construction / augmentation of factual knowledge bases. Implemented on the Docker platform, it can easily be deployed, customised, and scaled to large volumes of incoming media streams.

pdf bib
Proceedings of the Fourth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial)
Preslav Nakov | Marcos Zampieri | Nikola Ljubešić | Jörg Tiedemann | Shevin Malmasi | Ahmed Ali
Proceedings of the Fourth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial)

pdf bib
Findings of the VarDial Evaluation Campaign 2017
Marcos Zampieri | Shervin Malmasi | Nikola Ljubešić | Preslav Nakov | Ahmed Ali | Jörg Tiedemann | Yves Scherrer | Noëmi Aepli
Proceedings of the Fourth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial)

We present the results of the VarDial Evaluation Campaign on Natural Language Processing (NLP) for Similar Languages, Varieties and Dialects, which we organized as part of the fourth edition of the VarDial workshop at EACL’2017. This year, we included four shared tasks: Discriminating between Similar Languages (DSL), Arabic Dialect Identification (ADI), German Dialect Identification (GDI), and Cross-lingual Dependency Parsing (CLP). A total of 19 teams submitted runs across the four tasks, and 15 of them wrote system description papers.

2016

pdf bib
Discriminating between Similar Languages and Arabic Dialect Identification: A Report on the Third DSL Shared Task
Shervin Malmasi | Marcos Zampieri | Nikola Ljubešić | Preslav Nakov | Ahmed Ali | Jörg Tiedemann
Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial3)

We present the results of the third edition of the Discriminating between Similar Languages (DSL) shared task, which was organized as part of the VarDial’2016 workshop at COLING’2016. The challenge offered two subtasks: subtask 1 focused on the identification of very similar languages and language varieties in newswire texts, whereas subtask 2 dealt with Arabic dialect identification in speech transcripts. A total of 37 teams registered to participate in the task, 24 teams submitted test results, and 20 teams also wrote system description papers. High-order character n-grams were the most successful feature, and the best classification approaches included traditional supervised learning methods such as SVM, logistic regression, and language models, while deep learning approaches did not perform very well.

2015

pdf bib
Best Practices for Crowdsourcing Dialectal Arabic Speech Transcription
Samantha Wray | Hamdy Mubarak | Ahmed Ali
Proceedings of the Second Workshop on Arabic Natural Language Processing

pdf bib
Multi-Reference Evaluation for Dialectal Speech Recognition System: A Study for Egyptian ASR
Ahmed Ali | Walid Magdy | Steve Renals
Proceedings of the Second Workshop on Arabic Natural Language Processing

2012

pdf bib
Arabic Retrieval Revisited: Morphological Hole Filling
Kareem Darwish | Ahmed Ali
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)