Bart Desmet


pdf bib
Development of Natural Language Processing Tools to Support Determination of Federal Disability Benefits in the U.S.
Bart Desmet | Julia Porcino | Ayah Zirikly | Denis Newman-Griffis | Guy Divita | Elizabeth Rasch
Proceedings of the 1st Workshop on Language Technologies for Government and Public Administration (LT4Gov)

The disability benefits programs administered by the US Social Security Administration (SSA) receive between 2 and 3 million new applications each year. Adjudicators manually review hundreds of evidence pages per case to determine eligibility based on financial, medical, and functional criteria. Natural Language Processing (NLP) technology is uniquely suited to support this adjudication work and is a critical component of an ongoing inter-agency collaboration between SSA and the National Institutes of Health. This NLP work provides resources and models for document ranking, named entity recognition, and terminology extraction in order to automatically identify documents and reports pertinent to a case, and to allow adjudicators to search for and locate desired information quickly. In this paper, we describe our vision for how NLP can impact SSA’s adjudication process, present the resources and models that have been developed, and discuss some of the benefits and challenges in working with large-scale government data, and its specific properties in the functional domain.


pdf bib
Classifying the reported ability in clinical mobility descriptions
Denis Newman-Griffis | Ayah Zirikly | Guy Divita | Bart Desmet
Proceedings of the 18th BioNLP Workshop and Shared Task

Assessing how individuals perform different activities is key information for modeling health states of individuals and populations. Descriptions of activity performance in clinical free text are complex, including syntactic negation and similarities to textual entailment tasks. We explore a variety of methods for the novel task of classifying four types of assertions about activity performance: Able, Unable, Unclear, and None (no information). We find that ensembling an SVM trained with lexical features and a CNN achieves 77.9% macro F1 score on our task, and yields nearly 80% recall on the rare Unclear and Unable samples. Finally, we highlight several challenges in classifying performance assertions, including capturing information about sources of assistance, incorporating syntactic structure and negation scope, and handling new modalities at test time. Our findings establish a strong baseline for this novel task, and identify intriguing areas for further research.


pdf bib
SMHD: a Large-Scale Resource for Exploring Online Language Usage for Multiple Mental Health Conditions
Arman Cohan | Bart Desmet | Andrew Yates | Luca Soldaini | Sean MacAvaney | Nazli Goharian
Proceedings of the 27th International Conference on Computational Linguistics

Mental health is a significant and growing public health concern. As language usage can be leveraged to obtain crucial insights into mental health conditions, there is a need for large-scale, labeled, mental health-related datasets of users who have been diagnosed with one or more of such conditions. In this paper, we investigate the creation of high-precision patterns to identify self-reported diagnoses of nine different mental health conditions, and obtain high-quality labeled data without the need for manual labelling. We introduce the SMHD (Self-reported Mental Health Diagnoses) dataset and make it available. SMHD is a novel large dataset of social media posts from users with one or multiple mental health conditions along with matched control users. We examine distinctions in users’ language, as measured by linguistic and psychological variables. We further explore text classification methods to identify individuals with mental conditions through their language.

pdf bib
RSDD-Time: Temporal Annotation of Self-Reported Mental Health Diagnoses
Sean MacAvaney | Bart Desmet | Arman Cohan | Luca Soldaini | Andrew Yates | Ayah Zirikly | Nazli Goharian
Proceedings of the Fifth Workshop on Computational Linguistics and Clinical Psychology: From Keyboard to Clinic

Self-reported diagnosis statements have been widely employed in studying language related to mental health in social media. However, existing research has largely ignored the temporality of mental health diagnoses. In this work, we introduce RSDD-Time: a new dataset of 598 manually annotated self-reported depression diagnosis posts from Reddit that include temporal information about the diagnosis. Annotations include whether a mental health condition is present and how recently the diagnosis happened. Furthermore, we include exact temporal spans that relate to the date of diagnosis. This information is valuable for various computational methods to examine mental health through social media because one’s mental health state is not static. We also test several baseline classification and extraction approaches, which suggest that extracting temporal information from self-reported diagnosis statements is challenging.


pdf bib
Mental Distress Detection and Triage in Forum Posts: The LT3 CLPsych 2016 Shared Task System
Bart Desmet | Gilles Jacobs | Véronique Hoste
Proceedings of the Third Workshop on Computational Linguistics and Clinical Psychology

pdf bib
The GW/LT3 VarDial 2016 Shared Task System for Dialects and Similar Languages Detection
Ayah Zirikly | Bart Desmet | Mona Diab
Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial3)

This paper describes the GW/LT3 contribution to the 2016 VarDial shared task on the identification of similar languages (task 1) and Arabic dialects (task 2). For both tasks, we experimented with Logistic Regression and Neural Network classifiers in isolation. Additionally, we implemented a cascaded classifier that consists of coarse and fine-grained classifiers (task 1) and a classifier ensemble with majority voting for task 2. The submitted systems obtained state-of-the art performance and ranked first for the evaluation on social media data (test sets B1 and B2 for task 1), with a maximum weighted F1 score of 91.94%.


pdf bib
Detection and Fine-Grained Classification of Cyberbullying Events
Cynthia Van Hee | Els Lefever | Ben Verhoeven | Julie Mennes | Bart Desmet | Guy De Pauw | Walter Daelemans | Veronique Hoste
Proceedings of the International Conference Recent Advances in Natural Language Processing

pdf bib
UGENT-LT3 SCATE System for Machine Translation Quality Estimation
Arda Tezcan | Veronique Hoste | Bart Desmet | Lieve Macken
Proceedings of the Tenth Workshop on Statistical Machine Translation


pdf bib
Recognising suicidal messages in Dutch social media
Bart Desmet | Véronique Hoste
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Early detection of suicidal thoughts is an important part of effective suicide prevention. Such thoughts may be expressed online, especially by young people. This paper presents on-going work on the automatic recognition of suicidal messages in social media. We present experiments for automatically detecting relevant messages (with suicide-related content), and those containing suicide threats. A sample of 1357 texts was annotated in a corpus of 2674 blog posts and forum messages from Netlog, indicating relevance, origin, severity of suicide threat and risks as well as protective factors. For the classification experiments, Naive Bayes, SVM and KNN algorithms are combined with shallow features, i.e. bag-of-words of word, lemma and character ngrams, and post length. The best relevance classification is achieved by using SVM with post length, lemma and character ngrams, resulting in an F-score of 85.6% (78.7% precision and 93.8% recall). For the second task (threat detection), a cascaded setup which first filters out irrelevant messages with SVM and then predicts the severity with KNN, performs best: 59.2% F-score (69.5% precision and 51.6% recall).

pdf bib
Towards Shared Datasets for Normalization Research
Orphée De Clercq | Sarah Schulz | Bart Desmet | Véronique Hoste
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

In this paper we present a Dutch and English dataset that can serve as a gold standard for evaluating text normalization approaches. With the combination of text messages, message board posts and tweets, these datasets represent a variety of user generated content. All data was manually normalized to their standard form using newly-developed guidelines. We perform automatic lexical normalization experiments on these datasets using statistical machine translation techniques. We focus on both the word and character level and find that we can improve the BLEU score with ca. 20% for both languages. In order for this user generated content data to be released publicly to the research community some issues first need to be resolved. These are discussed in closer detail by focussing on the current legislation and by investigating previous similar data collection projects. With this discussion we hope to shed some light on various difficulties researchers are facing when trying to share social media data.


pdf bib
Normalization of Dutch User-Generated Content
Orphée De Clercq | Sarah Schulz | Bart Desmet | Els Lefever | Véronique Hoste
Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013


pdf bib
Towards a Balanced Named Entity Corpus for Dutch
Bart Desmet | Véronique Hoste
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper introduces a new named entity corpus for Dutch. State-of-the-art named entity recognition systems require a substantial annotated corpus to be trained on. Such corpora exist for English, but not for Dutch. The STEVIN-funded SoNaR project aims to produce a diverse 500-million-word reference corpus of written Dutch, with four semantic annotation layers: named entities, coreference relations, semantic roles and spatiotemporal expressions. A 1-million-word subset will be manually corrected. Named entity annotation guidelines for Dutch were developed, adapted from the MUC and ACE guidelines. Adaptations include the annotation of products and events, the classification into subtypes, and the markup of metonymic usage. Inter-annotator agreement experiments were conducted to corroborate the reliability of the guidelines, which yielded satisfactory results (Kappa scores above 0.90). We are building a NER system, trained on the 1-million-word subcorpus, to automatically classify the remainder of the SoNaR corpus. To this end, experiments with various classification algorithms (MBL, SVM, CRF) and features have been carried out and evaluated.

pdf bib
Towards a Learning Approach for Abbreviation Detection and Resolution.
Klaar Vanopstal | Bart Desmet | Véronique Hoste
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

The explosion of biomedical literature and with it the -uncontrolled- creation of abbreviations presents some special challenges for both human readers and computer applications. We developed an annotated corpus of Dutch medical text, and experimented with two approaches to abbreviation detection and resolution. Our corpus is composed of abstracts from two medical journals from the Low Countries in which approximately 65 percent (NTvG) and 48 percent (TvG) of the abbreviations have a corresponding full form in the abstract. Our first approach, a pattern-based system, consists of two steps: abbreviation detection and definition matching. This system has an average F-score of 0.82 for the detection of both defined and undefined abbreviations and an average F-score of 0.77 was obtained for the definitions. For our second approach, an SVM-based classifier was used on the preprocessed data sets, leading to an average F-score of 0.93 for the abbreviations; for the definitions an average F-score of 0.82 was obtained.