Cecile Paris

Also published as: Cécile Paris, Cecile L. Paris


2020

pdf bib
An Effective Transition-based Model for Discontinuous NER
Xiang Dai | Sarvnaz Karimi | Ben Hachey | Cecile Paris
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Unlike widely used Named Entity Recognition (NER) data sets in generic domains, biomedical NER data sets often contain mentions consisting of discontinuous spans. Conventional sequence tagging techniques encode Markov assumptions that are efficient but preclude recovery of these mentions. We propose a simple, effective transition-based model with generic neural encoding for discontinuous NER. Through extensive experiments on three biomedical data sets, we show that our model can effectively recognize discontinuous mentions without sacrificing the accuracy on continuous mentions.

pdf bib
Cost-effective Selection of Pretraining Data: A Case Study of Pretraining BERT on Social Media
Xiang Dai | Sarvnaz Karimi | Ben Hachey | Cecile Paris
Findings of the Association for Computational Linguistics: EMNLP 2020

Recent studies on domain-specific BERT models show that effectiveness on downstream tasks can be improved when models are pretrained on in-domain data. Often, the pretraining data used in these models are selected based on their subject matter, e.g., biology or computer science. Given the range of applications using social media text, and its unique language variety, we pretrain two models on tweets and forum text respectively, and empirically demonstrate the effectiveness of these two resources. In addition, we investigate how similarity measures can be used to nominate in-domain pretraining data. We publicly release our pretrained models at https://bit.ly/35RpTf0.

pdf bib
Assessing Social License to Operate from the Public Discourse on Social Media
Chang Xu | Cecile Paris | Ross Sparks | Surya Nepal | Keith VanderLinden
Proceedings of the 28th International Conference on Computational Linguistics: Industry Track

Organisations are monitoring their Social License to Operate (SLO) with increasing regularity. SLO, the level of support organisations gain from the public, is typically assessed through surveys or focus groups, which require expensive manual efforts and yield quickly-outdated results. In this paper, we present SIRTA (Social Insight via Real-Time Text Analytics), a novel real-time text analytics system for assessing and monitoring organisations’ SLO levels by analysing the public discourse from social posts. To assess SLO levels, our insight is to extract and transform peoples’ stances towards an organisation into SLO levels. SIRTA achieves this by performing a chain of three text classification tasks, where it identifies task-relevant social posts, discovers key SLO risks discussed in the posts, and infers stances specific to the SLO risks. We leverage recent language understanding techniques (e.g., BERT) for building our classifiers. To monitor SLO levels over time, SIRTA employs quality control mechanisms to reliably identify SLO trends and variations of multiple organisations in a market. These are derived from the smoothed time series of their SLO levels based on exponentially-weighted moving average (EWMA) calculation. Our experimental results show that SIRTA is highly effective in distilling stances from social posts for SLO level assessment, and that the continuous monitoring of SLO levels afforded by SIRTA enables the early detection of critical SLO changes.

pdf bib
Proceedings of the 1st Workshop on NLP for COVID-19 at ACL 2020
Karin Verspoor | Kevin Bretonnel Cohen | Mark Dredze | Emilio Ferrara | Jonathan May | Robert Munro | Cecile Paris | Byron Wallace
Proceedings of the 1st Workshop on NLP for COVID-19 at ACL 2020

2019

pdf bib
Does Multi-Task Learning Always Help?: An Evaluation on Health Informatics
Aditya Joshi | Sarvnaz Karimi | Ross Sparks | Cecile Paris | C Raina MacIntyre
Proceedings of the The 17th Annual Workshop of the Australasian Language Technology Association

Multi-Task Learning (MTL) has been an attractive approach to deal with limited labeled datasets or leverage related tasks, for a variety of NLP problems. We examine the benefit of MTL for three specific pairs of health informatics tasks that deal with: (a) overlapping symptoms for the same classification problem (personal health mention classification for influenza and for a set of symptoms); (b) overlapping medical concepts for related classification problems (vaccine usage and drug usage detection); and, (c) related classification problems (vaccination intent and vaccination relevance detection). We experiment with a simple neural architecture: a shared layer followed by task-specific dense layers. The novelty of this work is that it compares alternatives for shared layers for these pairs of tasks. While our observations agree with the promise of MTL as compared to single-task learning, for health informatics, we show that the benefit also comes with caveats in terms of the choice of shared layers and the relatedness between the participating tasks.

pdf bib
Reevaluating Argument Component Extraction in Low Resource Settings
Anirudh Joshi | Timothy Baldwin | Richard Sinnott | Cecile Paris
Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource NLP (DeepLo 2019)

Argument component extraction is a challenging and complex high-level semantic extraction task. As such, it is both expensive to annotate (meaning training data is limited and low-resource by nature), and hard for current-generation deep learning methods to model. In this paper, we reevaluate the performance of state-of-the-art approaches in both single- and multi-task learning settings using combinations of character-level, GloVe, ELMo, and BERT encodings using standard BiLSTM-CRF encoders. We use evaluation metrics that are more consistent with evaluation practice in named entity recognition to understand how well current baselines address this challenge and compare their performance to lower-level semantic tasks such as CoNLL named entity recognition. We find that performance utilizing various pre-trained representations and training methodologies often leaves a lot to be desired as it currently stands, and suggest future pathways for improvement.

pdf bib
Figurative Usage Detection of Symptom Words to Improve Personal Health Mention Detection
Adith Iyer | Aditya Joshi | Sarvnaz Karimi | Ross Sparks | Cecile Paris
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Personal health mention detection deals with predicting whether or not a given sentence is a report of a health condition. Past work mentions errors in this prediction when symptom words, i.e., names of symptoms of interest, are used in a figurative sense. Therefore, we combine a state-of-the-art figurative usage detection with CNN-based personal health mention detection. To do so, we present two methods: a pipeline-based approach and a feature augmentation-based approach. The introduction of figurative usage detection results in an average improvement of 2.21% F-score of personal health mention detection, in the case of the feature augmentation-based approach. This paper demonstrates the promise of using figurative usage detection to improve personal health mention detection.

pdf bib
Recognising Agreement and Disagreement between Stances with Reason Comparing Networks
Chang Xu | Cecile Paris | Surya Nepal | Ross Sparks
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

We identify agreement and disagreement between utterances that express stances towards a topic of discussion. Existing methods focus mainly on conversational settings, where dialogic features are used for (dis)agreement inference. We extend this scope and seek to detect stance (dis)agreement in a broader setting, where independent stance-bearing utterances, which prevail in many stance corpora and real-world scenarios, are compared. To cope with such non-dialogic utterances, we find that the reasons uttered to back up a specific stance can help predict stance (dis)agreements. We propose a reason comparing network (RCN) to leverage reason information for stance comparison. Empirical results on a well-known stance corpus show that our method can discover useful reason information, enabling it to outperform several baselines in stance (dis)agreement detection.

pdf bib
NNE: A Dataset for Nested Named Entity Recognition in English Newswire
Nicky Ringland | Xiang Dai | Ben Hachey | Sarvnaz Karimi | Cecile Paris | James R. Curran
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Named entity recognition (NER) is widely used in natural language processing applications and downstream tasks. However, most NER tools target flat annotation from popular datasets, eschewing the semantic information available in nested entity mentions. We describe NNE—a fine-grained, nested named entity dataset over the full Wall Street Journal portion of the Penn Treebank (PTB). Our annotation comprises 279,795 mentions of 114 entity types with up to 6 layers of nesting. We hope the public release of this large dataset for English newswire will encourage development of new techniques for nested NER.

pdf bib
A Comparison of Word-based and Context-based Representations for Classification Problems in Health Informatics
Aditya Joshi | Sarvnaz Karimi | Ross Sparks | Cecile Paris | C Raina MacIntyre
Proceedings of the 18th BioNLP Workshop and Shared Task

Distributed representations of text can be used as features when training a statistical classifier. These representations may be created as a composition of word vectors or as context-based sentence vectors. We compare the two kinds of representations (word versus context) for three classification problems: influenza infection classification, drug usage classification and personal health mention classification. For statistical classifiers trained for each of these problems, context-based representations based on ELMo, Universal Sentence Encoder, Neural-Net Language Model and FLAIR are better than Word2Vec, GloVe and the two adapted using the MESH ontology. There is an improvement of 2-4% in the accuracy when these context-based representations are used instead of word-based representations.

pdf bib
Using Similarity Measures to Select Pretraining Data for NER
Xiang Dai | Sarvnaz Karimi | Ben Hachey | Cecile Paris
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Word vectors and Language Models (LMs) pretrained on a large amount of unlabelled data can dramatically improve various Natural Language Processing (NLP) tasks. However, the measure and impact of similarity between pretraining data and target task data are left to intuition. We propose three cost-effective measures to quantify different aspects of similarity between source pretraining and target task data. We demonstrate that these measures are good predictors of the usefulness of pretrained models for Named Entity Recognition (NER) over 30 data pairs. Results also suggest that pretrained LMs are more effective and more predictable than pretrained word vectors, but pretrained word vectors are better when pretraining data is dissimilar.

2018

pdf bib
UniMelb at SemEval-2018 Task 12: Generative Implication using LSTMs, Siamese Networks and Semantic Representations with Synonym Fuzzing
Anirudh Joshi | Tim Baldwin | Richard O. Sinnott | Cecile Paris
Proceedings of The 12th International Workshop on Semantic Evaluation

This paper describes a warrant classification system for SemEval 2018 Task 12, that attempts to learn semantic representations of reasons, claims and warrants. The system consists of 3 stacked LSTMs: one for the reason, one for the claim, and one shared Siamese Network for the 2 candidate warrants. Our main contribution is to force the embeddings into a shared feature space using vector operations, semantic similarity classification, Siamese networks, and multi-task learning. In doing so, we learn a form of generative implication, in encoding implication interrelationships between reasons, claims, and the associated correct and incorrect warrants. We augment the limited data in the task further by utilizing WordNet synonym “fuzzing”. When applied to SemEval 2018 Task 12, our system performs well on the development data, and officially ranked 8th among 21 teams.

pdf bib
Cross-Target Stance Classification with Self-Attention Networks
Chang Xu | Cécile Paris | Surya Nepal | Ross Sparks
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

In stance classification, the target on which the stance is made defines the boundary of the task, and a classifier is usually trained for prediction on the same target. In this work, we explore the potential for generalizing classifiers between different targets, and propose a neural model that can apply what has been learned from a source target to a destination target. We show that our model can find useful information shared between relevant targets which improves generalization in certain scenarios.

pdf bib
Shot Or Not: Comparison of NLP Approaches for Vaccination Behaviour Detection
Aditya Joshi | Xiang Dai | Sarvnaz Karimi | Ross Sparks | Cécile Paris | C Raina MacIntyre
Proceedings of the 2018 EMNLP Workshop SMM4H: The 3rd Social Media Mining for Health Applications Workshop & Shared Task

Vaccination behaviour detection deals with predicting whether or not a person received/was about to receive a vaccine. We present our submission for vaccination behaviour detection shared task at the SMM4H workshop. Our findings are based on three prevalent text classification approaches: rule-based, statistical and deep learning-based. Our final submissions are: (1) an ensemble of statistical classifiers with task-specific features derived using lexicons, language processing tools and word embeddings; and, (2) a LSTM classifier with pre-trained language models.

2017

pdf bib
Demographic Inference on Twitter using Recursive Neural Networks
Sunghwan Mac Kim | Qiongkai Xu | Lizhen Qu | Stephen Wan | Cécile Paris
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

In social media, demographic inference is a critical task in order to gain a better understanding of a cohort and to facilitate interacting with one’s audience. Most previous work has made independence assumptions over topological, textual and label information on social networks. In this work, we employ recursive neural networks to break down these independence assumptions to obtain inference about demographic characteristics on Twitter. We show that our model performs better than existing models including the state-of-the-art.

pdf bib
Medication and Adverse Event Extraction from Noisy Text
Xiang Dai | Sarvnaz Karimi | Cecile Paris
Proceedings of the Australasian Language Technology Association Workshop 2017

2016

pdf bib
MuTUAL: A Controlled Authoring Support System Enabling Contextual Machine Translation
Rei Miyata | Anthony Hartley | Kyo Kageura | Cécile Paris | Masao Utiyama | Eiichiro Sumita
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations

The paper introduces a web-based authoring support system, MuTUAL, which aims to help writers create multilingual texts. The highlighted feature of the system is that it enables machine translation (MT) to generate outputs appropriate to their functional context within the target document. Our system is operational online, implementing core mechanisms for document structuring and controlled writing. These include a topic template and a controlled language authoring assistant, linked to our statistical MT system.

pdf bib
The Role of Features and Context on Suicide Ideation Detection
Yufei Wang | Stephen Wan | Cécile Paris
Proceedings of the Australasian Language Technology Association Workshop 2016

pdf bib
Data61-CSIRO systems at the CLPsych 2016 Shared Task
Sunghwan Mac Kim | Yufei Wang | Stephen Wan | Cécile Paris
Proceedings of the Third Workshop on Computational Linguistics and Clinical Psychology

pdf bib
The Effects of Data Collection Methods in Twitter
Sunghwan Mac Kim | Stephen Wan | Cécile Paris | Brian Jin | Bella Robinson
Proceedings of the First Workshop on NLP and Computational Social Science

pdf bib
Detecting Social Roles in Twitter
Sunghwan Mac Kim | Stephen Wan | Cécile Paris
Proceedings of The Fourth International Workshop on Natural Language Processing for Social Media

2015

pdf bib
Ranking election issues through the lens of social media
Stephen Wan | Cécile Paris
Proceedings of the 9th SIGHUM Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH)

2013

pdf bib
Automatic Prediction of Evidence-based Recommendations via Sentence-level Polarity Classification
Abeed Sarker | Diego Mollá-Aliod | Cécile Paris
Proceedings of the Sixth International Joint Conference on Natural Language Processing

pdf bib
A Study: From Electronic Laboratory Notebooks to Generated Queries for Literature Recommendation
Oldooz Dianat | Cécile Paris | Stephen Wan
Proceedings of the Australasian Language Technology Association Workshop 2013 (ALTA 2013)

2012

pdf bib
Towards Two-step Multi-document Summarisation for Evidence Based Medicine: A Quantitative Analysis
Abeed Sarker | Diego Mollá-Aliod | Cécile Paris
Proceedings of the Australasian Language Technology Association Workshop 2012

pdf bib
Unifying Local and Global Agreement and Disagreement Classification in Online Debates
Jie Yin | Nalin Narang | Paul Thomas | Cecile Paris
Proceedings of the 3rd Workshop in Computational Approaches to Subjectivity and Sentiment Analysis

2011

pdf bib
Outcome Polarity Identification of Medical Papers
Abeed Sarker | Diego Molla | Cécile Paris
Proceedings of the Australasian Language Technology Association Workshop 2011

2010

pdf bib
Detecting Emails Containing Requests for Action
Andrew Lampert | Robert Dale | Cecile Paris
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

2009

pdf bib
Segmenting Email Message Text into Zones
Andrew Lampert | Robert Dale | Cécile Paris
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf bib
Designing a Citation-Sensitive Research Tool: An Initial Study of Browsing-Specific Information Needs
Stephen Wan | Cécile Paris | Michael Muthukrishna | Robert Dale
Proceedings of the 2009 Workshop on Text and Citation Analysis for Scholarly Digital Libraries (NLPIR4DL)

pdf bib
Improving Grammaticality in Statistical Sentence Generation: Introducing a Dependency Spanning Tree Algorithm with an Argument Satisfaction Model
Stephen Wan | Mark Dras | Robert Dale | Cécile Paris
Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009)

2008

pdf bib
In-Browser Summarisation: Generating Elaborative Summaries Biased Towards the Reading Context
Stephen Wan | Cécile Paris
Proceedings of ACL-08: HLT, Short Papers

pdf bib
Requests and Commitments in Email are More Complex Than You Think: Eight Reasons to be Cautious
Andrew Lampert | Robert Dale | Cécile Paris
Proceedings of the Australasian Language Technology Association Workshop 2008

pdf bib
Fit it in but say it well!
Cécile Paris | Nathalie Colineau | Andrew Lampert | Joan Giralt Duran
Proceedings of the Australasian Language Technology Association Workshop 2008

pdf bib
Generation under Space Constraints
Cécile Paris | Nathalie Colineau | Andrew Lampert | Joan Giralt Duran
Coling 2008: Companion volume: Posters

pdf bib
Seed and Grow: Augmenting Statistically Generated Summary Sentences using Schematic Word Patterns
Stephen Wan | Robert Dale | Mark Dras | Cécile Paris
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing

2006

pdf bib
Classifying Speech Acts using Verbal Response Modes
Andrew Lampert | Robert Dale | Cécile Paris
Proceedings of the Australasian Language Technology Workshop 2006

pdf bib
Pseudo Relevance Feedback Using Named Entities for Question Answering
Luiz Augusto Pizzato | Diego Mollá | Cécile Paris
Proceedings of the Australasian Language Technology Workshop 2006

pdf bib
Using Dependency-Based Features to Take the ’Para-farce’ out of Paraphrase
Stephen Wan | Mark Dras | Robert Dale | Cécile Paris
Proceedings of the Australasian Language Technology Workshop 2006

pdf bib
Proceedings of the Fourth International Natural Language Generation Conference
Nathalie Colineau | Cécile Paris | Stephen Wan | Robert Dale
Proceedings of the Fourth International Natural Language Generation Conference

pdf bib
Evaluations of NLG Systems: Common Corpus and Tasks or Common Dimensions and Metrics?
Cécile Paris | Nathalie Colineau | Ross Wilkinson
Proceedings of the Fourth International Natural Language Generation Conference

2005

pdf bib
Towards Statistical Paraphrase Generation: Preliminary Evaluations of Grammaticality
Stephen Wan | Mark Dras | Robert Dale | Cécile Paris
Proceedings of the Third International Workshop on Paraphrasing (IWP2005)

2004

pdf bib
Proceedings of the Australasian Language Technology Workshop 2004
Ash Asudeh | Cecile Paris | Stephen Wan
Proceedings of the Australasian Language Technology Workshop 2004

pdf bib
An Evaluation on Query-biased Summarisation for the Question Answering Task
Mingfang Wu | Ross Wilkinson | Cecile Paris
Proceedings of the Australasian Language Technology Workshop 2004

pdf bib
Information Assembly for Automatic Content Adaptation
Andrew Lampert | Cecile Paris
Proceedings of the Australasian Language Technology Workshop 2004

pdf bib
Intelligent Multi Media Presentation of information in a semi-immersive Command and Control environment
Cecile Paris | Nathalie Colineau | Dominique Estival
Proceedings of the Australasian Language Technology Workshop 2004

2003

pdf bib
Using Thematic Information in Statistical Headline Generation
Stephen Wan | Mark Dras | Cécile Paris | Robert Dale
Proceedings of the ACL 2003 Workshop on Multilingual Summarization and Question Answering

pdf bib
Straight to the point: Discovering themes for summary generation
Stephen Wan | Mark Dras | Cecile Paris | Robert Dale
Proceedings of the Australasian Language Technology Workshop 2003

2002

pdf bib
An Evaluation of Procedural Instructional Text
Nathalie Colineau | Cecile Paris | Keith Vander Linden
Proceedings of the International Natural Language Generation Conference

1996

pdf bib
Two Sources of Control Over the Generation of Software Instructions
Anthony Hartley | Cecile Paris
34th Annual Meeting of the Association for Computational Linguistics

pdf bib
Building Knowledge Bases for the Generation of Software Documentation
Cecile Paris | Keith Vander Linden
COLING 1996 Volume 2: The 16th International Conference on Computational Linguistics

pdf bib
DRAFTER
Cécile Paris | Keith Vander Linden
Eighth International Natural Language Generation Workshop (Posters and Demonstrations)

1994

pdf bib
Intentions, Structure and Expression in Multi-Lingual Instructions
Cecile L. Paris | Donia R. Scott
Proceedings of the Seventh International Workshop on Natural Language Generation

pdf bib
Expressing Procedural Relationships in Multilingual Instructions
Judy Delin | Anthony Hartley | Cecile Paris | Donia Scott | Keith Vander Linden
Proceedings of the Seventh International Workshop on Natural Language Generation

1993

pdf bib
Planning Text for Advisory Dialogues: Capturing Intentional and Rhetorical Information
Johanna D. Moore | Cecile L. Paris
Computational Linguistics, Volume 19, Number 4, December 1993

pdf bib
On the Necessity of Intentions and the Usefulness of Rhetorical Relations: A Position Paper
Vibhu Mittal | Cecile Paris
Intentionality and Structure in Discourse Relations

1989

pdf bib
Planning Text for Advisory Dialogues
Johanna D. Moore | Cecile L. Paris
27th Annual Meeting of the Association for Computational Linguistics

1988

pdf bib
Tailoring Object Descriptions to a User’s Level of Expertise
Cecile L. Paris
Computational Linguistics, Volume 14, Number 3, September 1988

1987

pdf bib
Functional Unification Grammar Revisited
Kathleen R. McKeown | Cecile L. Paris
25th Annual Meeting of the Association for Computational Linguistics

1985

pdf bib
Description Strategies for Naive and Expert Users
Cecile L. Paris
23rd Annual Meeting of the Association for Computational Linguistics