Jeffrey Sorensen

Also published as: Jeffrey S. Sorensen


2020

pdf bib
Toxicity Detection: Does Context Really Matter?
John Pavlopoulos | Jeffrey Sorensen | Lucas Dixon | Nithum Thain | Ion Androutsopoulos
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Moderation is crucial to promoting healthy online discussions. Although several ‘toxicity’ detection datasets and models have been published, most of them ignore the context of the posts, implicitly assuming that comments may be judged independently. We investigate this assumption by focusing on two questions: (a) does context affect the human judgement, and (b) does conditioning on context improve performance of toxicity detection systems? We experiment with Wikipedia conversations, limiting the notion of context to the previous post in the thread and the discussion title. We find that context can both amplify or mitigate the perceived toxicity of posts. Moreover, a small but significant subset of manually labeled posts (5% in one of our experiments) end up having the opposite toxicity labels if the annotators are not provided with context. Surprisingly, we also find no evidence that context actually improves the performance of toxicity classifiers, having tried a range of classifiers and mechanisms to make them context aware. This points to the need for larger datasets of comments annotated in context. We make our code and data publicly available.

pdf bib
Six Attributes of Unhealthy Conversations
Ilan Price | Jordan Gifford-Moore | Jory Flemming | Saul Musker | Maayan Roichman | Guillaume Sylvain | Nithum Thain | Lucas Dixon | Jeffrey Sorensen
Proceedings of the Fourth Workshop on Online Abuse and Harms

We present a new dataset of approximately 44000 comments labeled by crowdworkers. Each comment is labelled as either ‘healthy’ or ‘unhealthy’, in addition to binary labels for the presence of six potentially ‘unhealthy’ sub-attributes: (1) hostile; (2) antagonistic, insulting, provocative or trolling; (3) dismissive; (4) condescending or patronising; (5) sarcastic; and/or (6) an unfair generalisation. Each label also has an associated confidence score. We argue that there is a need for datasets which enable research based on a broad notion of ‘unhealthy online conversation’. We build this typology to encompass a substantial proportion of the individual comments which contribute to unhealthy online conversation. For some of these attributes, this is the first publicly available dataset of this scale. We explore the quality of the dataset, present some summary statistics and initial models to illustrate the utility of this data, and highlight limitations and directions for further research.

2012

pdf bib
The OpenGrm open-source finite-state grammar software libraries
Brian Roark | Richard Sproat | Cyril Allauzen | Michael Riley | Jeffrey Sorensen | Terry Tai
Proceedings of the ACL 2012 System Demonstrations

2010

pdf bib
Syntax Based Reordering with Automatically Derived Rules for Improved Statistical Machine Translation
Karthik Visweswariah | Jiri Navratil | Jeffrey Sorensen | Vijil Chenthamarakshan | Nandakishore Kambhatla
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

2006

pdf bib
Maximum Entropy Based Restoration of Arabic Diacritics
Imed Zitouni | Jeffrey S. Sorensen | Ruhi Sarikaya
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

2005

pdf bib
The Impact of Morphological Stemming on Arabic Mention Detection and Coreference Resolution
Imed Zitouni | Jeffrey Sorensen | Xiaoqiang Luo | Radu Florian
Proceedings of the ACL Workshop on Computational Approaches to Semitic Languages

pdf bib
An Integrated Approach for Arabic-English Named Entity Translation
Hany Hassan | Jeffrey Sorensen
Proceedings of the ACL Workshop on Computational Approaches to Semitic Languages

2004

pdf bib
Dependency Tree Kernels for Relation Extraction
Aron Culotta | Jeffrey Sorensen
Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04)

2003

pdf bib
TIPS: A Translingual Information Processing System
Yaser Al-Onaizan | Radu Florian | Martin Franz | Hany Hassan | Young-Suk Lee | J. Scott McCarley | Kishore Papineni | Salim Roukos | Jeffrey Sorensen | Christoph Tillmann | Todd Ward | Fei Xia
Companion Volume of the Proceedings of HLT-NAACL 2003 - Demonstrations