Mark Lee

Also published as: M.G. Lee, Mark G. Lee


2020

pdf bib
Combining Character and Word Embeddings for the Detection of Offensive Language in Arabic
Abdullah I. Alharbi | Mark Lee
Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection

Twitter and other social media platforms offer users the chance to share their ideas via short posts. While the easy exchange of ideas has value, these microblogs can be leveraged by people who want to share hatred. and such individuals can share negative views about an individual, race, or group with millions of people at the click of a button. There is thus an urgent need to establish a method that can automatically identify hate speech and offensive language. To contribute to this development, during the OSACT4 workshop, a shared task was undertaken to detect offensive language in Arabic. A key challenge was the uniqueness of the language used on social media, prompting the out-of-vocabulary (OOV) problem. In addition, the use of different dialects in Arabic exacerbates this problem. To deal with the issues associated with OOV, we generated a character-level embeddings model, which was trained on a massive data collected carefully. This level of embeddings can work effectively in resolving the problem of OOV words through its ability to learn the vectors of character n-grams or parts of words. The proposed systems were ranked 7th and 8th for Subtasks A and B, respectively.

pdf bib
BhamNLP at SemEval-2020 Task 12: An Ensemble of Different Word Embeddings and Emotion Transfer Learning for Arabic Offensive Language Identification in Social Media
Abdullah I. Alharbi | Mark Lee
Proceedings of the Fourteenth Workshop on Semantic Evaluation

Social media platforms such as Twitter offer people an opportunity to publish short posts in which they can share their opinions and perspectives. While these applications can be valuable, they can also be exploited to promote negative opinions, insults, and hatred against a person, race, or group. These opinions can be spread to millions of people at the click of a mouse. As such, there is a need to develop mechanisms by which offensive language can be automatically detected in social media channels and managed in a timely manner. To help achieve this goal, SemEval 2020 offered a shared task (OffensEval 2020) that involved the detection of offensive text in Arabic. We propose an ensemble approach that combines different levels of word embedding models and transfers learning from other sources of emotion-related tasks. The proposed system ranked 9th out of the 52 entries within the Arabic Offensive language identification subtask.

pdf bib
“What is on your mind?” Automated Scoring of Mindreading in Childhood and Early Adolescence
Venelin Kovatchev | Phillip Smith | Mark Lee | Imogen Grumley Traynor | Irene Luque Aguilera | Rory Devine
Proceedings of the 28th International Conference on Computational Linguistics

In this paper we present the first work on the automated scoring of mindreading ability in middle childhood and early adolescence. We create MIND-CA, a new corpus of 11,311 question-answer pairs in English from 1,066 children aged from 7 to 14. We perform machine learning experiments and carry out extensive quantitative and qualitative evaluation. We obtain promising results, demonstrating the applicability of state-of-the-art NLP solutions to a new domain and task.

pdf bib
Augmenting Neural Metaphor Detection with Concreteness
Ghadi Alnafesah | Harish Tayyar Madabushi | Mark Lee
Proceedings of the Second Workshop on Figurative Language Processing

The idea that a shift in concreteness within a sentence indicates the presence of a metaphor has been around for a while. However, recent methods of detecting metaphor that have relied on deep neural models have ignored concreteness and related psycholinguistic information. We hypothesis that this information is not available to these models and that their addition will boost the performance of these models in detecting metaphor. We test this hypothesis on the Metaphor Detection Shared Task 2020 and find that the addition of concreteness information does in fact boost deep neural models. We also run tests on data from a previous shared task and show similar results.

2019

pdf bib
Crisis Detection from Arabic Tweets
Alaa Alharbi | Mark Lee
Proceedings of the 3rd Workshop on Arabic Corpus Linguistics

2018

pdf bib
Integrating Question Classification and Deep Learning for improved Answer Selection
Harish Tayyar Madabushi | Mark Lee | John Barnden
Proceedings of the 27th International Conference on Computational Linguistics

We present a system for Answer Selection that integrates fine-grained Question Classification with a Deep Learning model designed for Answer Selection. We detail the necessary changes to the Question Classification taxonomy and system, the creation of a new Entity Identification system and methods of highlighting entities to achieve this objective. Our experiments show that Question Classes are a strong signal to Deep Learning models for Answer Selection, and enable us to outperform the current state of the art in all variations of our experiments except one. In the best configuration, our MRR and MAP scores outperform the current state of the art by between 3 and 5 points on both versions of the TREC Answer Selection test set, a standard dataset for this task.

2016

pdf bib
High Accuracy Rule-based Question Classification using Question Syntax and Semantics
Harish Tayyar Madabushi | Mark Lee
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

We present in this paper a purely rule-based system for Question Classification which we divide into two parts: The first is the extraction of relevant words from a question by use of its structure, and the second is the classification of questions based on rules that associate these words to Concepts. We achieve an accuracy of 97.2%, close to a 6 point improvement over the previous State of the Art of 91.6%. Additionally, we believe that machine learning algorithms can be applied on top of this method to further improve accuracy.

pdf bib
UoB-UK at SemEval-2016 Task 1: A Flexible and Extendable System for Semantic Text Similarity using Types, Surprise and Phrase Linking
Harish Tayyar Madabushi | Mark Buhagiar | Mark Lee
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

2015

pdf bib
Sentiment Classification via a Response Recalibration Framework
Phillip Smith | Mark Lee
Proceedings of the 6th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

2014

pdf bib
A Hybrid Approach to Features Representation for Fine-grained Arabic Named Entity Recognition
Fahd Alotaibi | Mark Lee
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

2013

pdf bib
Automatically Developing a Fine-grained Arabic Named Entity Corpus and Gazetteer by utilizing Wikipedia
Fahd Alotaibi | Mark Lee
Proceedings of the Sixth International Joint Conference on Natural Language Processing

2012

pdf bib
Building Text-to-Speech Systems for Resource Poor Languages
Nur-Hana Samsudin | Mark Lee
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This paper describes research on building text-to-speech synthesis systems (TTS) for resource poor languages using available resources from other languages and describes our general approach to building cross-linguistic polyglot TTS. Our approach involves three main steps: language clustering, grapheme to phoneme mapping and prosody modelling. We have tested the mapping of phonemes from German to English and from Indonesian to Spanish. We have also constructed three prosody representations for different language characteristics. For evaluation we have developed an English TTS based on German data, and a Spanish TTS based on Indonesian data and compared their performance against pre-existing monolingual TTSs. Since our motivation is to develop speech synthesis for resource poor languages, we have also developed three TTS for Iban, an Austronesian language with practically no available language resources, using Malay, Indonesian and Spanish resources.

pdf bib
Cross-discourse Development of Supervised Sentiment Analysis in the Clinical Domain
Phillip Smith | Mark Lee
Proceedings of the 3rd Workshop in Computational Approaches to Subjectivity and Sentiment Analysis

pdf bib
A CCG-based Approach to Fine-Grained Sentiment Analysis
Phillip Smith | Mark Lee
Proceedings of the 2nd Workshop on Sentiment Analysis where AI meets Psychology

pdf bib
Mapping Arabic Wikipedia into the Named Entities Taxonomy
Fahd Alotaibi | Mark Lee
Proceedings of COLING 2012: Posters

2008

pdf bib
Textual Entailment as an Evaluation Framework for Metaphor Resolution: A Proposal
Rodrigo Agerri | John Barnden | Mark Lee | Alan Wallington
Semantics in Text Processing. STEP 2008 Conference Proceedings

2007

pdf bib
Don’t worry about metaphor: affect detection for conversational agents
Catherine Smith | Timothy Rumbell | John Barnden | Robert Hendley | Mark Lee | Alan Wallington | Li Zhang
Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions

pdf bib
On the formalization of Invariant Mappings for Metaphor Interpretation
Rodrigo Agerri | John Barnden | Mark Lee | Alan Wallington
Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions

2006

pdf bib
Considerations on the nature of metaphorical meaning arising from a computational treatment of metaphor interpretation
A.M. Wallington | R. Agerri | J.A. Barnden | S.R. Glasbey | M.G. Lee
Proceedings of the Fifth International Workshop on Inference in Computational Semantics (ICoS-5)

2003

pdf bib
Domain-transcending mappings in a system for metaphorical reasoning
John A. Barnden | Sheila R. Glasbey | Mark G. Lee | Alan M. Wallington
10th Conference of the European Chapter of the Association for Computational Linguistics

2002

pdf bib
Reasoning in Metaphor Understanding: The ATT-Meta Approach and System
John Barnden | Sheila Glasbey | Mark Lee | Alan Wallington
COLING 2002: The 17th International Conference on Computational Linguistics: Project Notes

1996

pdf bib
An ascription-based approach to Speech Acts
Mark Lee | Yorick Wilks
COLING 1996 Volume 2: The 16th International Conference on Computational Linguistics