Alla Rozovskaya


2020

pdf bib
A Comparative Study of Synthetic Data Generation Methods for Grammatical Error Correction
Max White | Alla Rozovskaya
Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications

Grammatical Error Correction (GEC) is concerned with correcting grammatical errors in written text. Current GEC systems, namely those leveraging statistical and neural machine translation, require large quantities of annotated training data, which can be expensive or impractical to obtain. This research compares techniques for generating synthetic data utilized by the two highest scoring submissions to the restricted and low-resource tracks in the BEA-2019 Shared Task on Grammatical Error Correction.

2019

pdf bib
Grammar Error Correction in Morphologically Rich Languages: The Case of Russian
Alla Rozovskaya | Dan Roth
Transactions of the Association for Computational Linguistics, Volume 7

Until now, most of the research in grammar error correction focused on English, and the problem has hardly been explored for other languages. We address the task of correcting writing mistakes in morphologically rich languages, with a focus on Russian. We present a corrected and error-tagged corpus of Russian learner writing and develop models that make use of existing state-of-the-art methods that have been well studied for English. Although impressive results have recently been achieved for grammar error correction of non-native English writing, these results are limited to domains where plentiful training data are available. Because annotation is extremely costly, these approaches are not suitable for the majority of domains and languages. We thus focus on methods that use “minimal supervision”; that is, those that do not rely on large amounts of annotated training data, and show how existing minimal-supervision approaches extend to a highly inflectional language such as Russian. The results demonstrate that these methods are particularly useful for correcting mistakes in grammatical phenomena that involve rich morphology.

pdf bib
A Benchmark Corpus of English Misspellings and a Minimally-supervised Model for Spelling Correction
Michael Flor | Michael Fried | Alla Rozovskaya
Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications

Spelling correction has attracted a lot of attention in the NLP community. However, models have been usually evaluated on artificiallycreated or proprietary corpora. A publiclyavailable corpus of authentic misspellings, annotated in context, is still lacking. To address this, we present and release an annotated data set of 6,121 spelling errors in context, based on a corpus of essays written by English language learners. We also develop a minimallysupervised context-aware approach to spelling correction. It achieves strong results on our data: 88.12% accuracy. This approach can also train with a minimal amount of annotated data (performance reduced by less than 1%). Furthermore, this approach allows easy portability to new domains. We evaluate our model on data from a medical domain and demonstrate that it rivals the performance of a model trained and tuned on in-domain data.

2018

pdf bib
Predicting Discharge Disposition Using Patient Complaint Notes in Electronic Medical Records
Mohamad Salimi | Alla Rozovskaya
Proceedings of the BioNLP 2018 workshop

Overcrowding in emergency rooms is a major challenge faced by hospitals across the United States. Overcrowding can result in longer wait times, which, in turn, has been shown to adversely affect patient satisfaction, clinical outcomes, and procedure reimbursements. This paper presents research that aims to automatically predict discharge disposition of patients who received medical treatment in an emergency department. We make use of a corpus that consists of notes containing patient complaints, diagnosis information, and disposition, entered by health care providers. We use this corpus to develop a model that uses the complaint and diagnosis information to predict patient disposition. We show that the proposed model substantially outperforms the baseline of predicting the most common disposition type. The long-term goal of this research is to build a model that can be implemented as a real-time service in an application to predict disposition as patients arrive.

2017

pdf bib
Adapting to Learner Errors with Minimal Supervision
Alla Rozovskaya | Dan Roth | Mark Sammons
Computational Linguistics, Volume 43, Issue 4 - December 2017

This article considers the problem of correcting errors made by English as a Second Language writers from a machine learning perspective, and addresses an important issue of developing an appropriate training paradigm for the task, one that accounts for error patterns of non-native writers using minimal supervision. Existing training approaches present a trade-off between large amounts of cheap data offered by the native-trained models and additional knowledge of learner error patterns provided by the more expensive method of training on annotated learner data. We propose a novel training approach that draws on the strengths offered by the two standard training paradigms—of training either on native or on annotated learner data—and that outperforms both of these standard methods. Using the key observation that parameters relating to error regularities exhibited by non-native writers are relatively simple, we develop models that can incorporate knowledge about error regularities based on a small annotated sample but that are otherwise trained on native English data. The key contribution of this article is the introduction and analysis of two methods for adapting the learned models to error patterns of non-native writers; one method that applies to generative classifiers and a second that applies to discriminative classifiers. Both methods demonstrated state-of-the-art performance in several text correction competitions. In particular, the Illinois system that implements these methods ranked at the top in two recent CoNLL shared tasks on error correction.1 We conduct further evaluation of the proposed approaches studying the effect of using error data from speakers of the same native language, languages that are closely related linguistically, and unrelated languages.

2016

pdf bib
The Virginia Tech System at CoNLL-2016 Shared Task on Shallow Discourse Parsing
Prashant Chandrasekar | Xuan Zhang | Saurabh Chakravarty | Arijit Ray | John Krulick | Alla Rozovskaya
Proceedings of the CoNLL-16 shared task

pdf bib
Grammatical Error Correction: Machine Translation and Classifiers
Alla Rozovskaya | Dan Roth
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2015

pdf bib
Correction Annotation for Non-Native Arabic Texts: Guidelines and Corpus
Wajdi Zaghouani | Nizar Habash | Houda Bouamor | Alla Rozovskaya | Behrang Mohit | Abeer Heider | Kemal Oflazer
Proceedings of The 9th Linguistic Annotation Workshop

pdf bib
The Second QALB Shared Task on Automatic Text Correction for Arabic
Alla Rozovskaya | Houda Bouamor | Nizar Habash | Wajdi Zaghouani | Ossama Obeid | Behrang Mohit
Proceedings of the Second Workshop on Arabic Natural Language Processing

2014

pdf bib
Generalized Character-Level Spelling Error Correction
Noura Farra | Nadi Tomeh | Alla Rozovskaya | Nizar Habash
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Correcting Grammatical Verb Errors
Alla Rozovskaya | Dan Roth | Vivek Srikumar
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
The Illinois-Columbia System in the CoNLL-2014 Shared Task
Alla Rozovskaya | Kai-Wei Chang | Mark Sammons | Dan Roth | Nizar Habash
Proceedings of the Eighteenth Conference on Computational Natural Language Learning: Shared Task

pdf bib
The First QALB Shared Task on Automatic Text Correction for Arabic
Behrang Mohit | Alla Rozovskaya | Nizar Habash | Wajdi Zaghouani | Ossama Obeid
Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP)

pdf bib
The Columbia System in the QALB-2014 Shared Task on Arabic Error Correction
Alla Rozovskaya | Nizar Habash | Ramy Eskander | Noura Farra | Wael Salloum
Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP)

pdf bib
Building a State-of-the-Art Grammatical Error Correction System
Alla Rozovskaya | Dan Roth
Transactions of the Association for Computational Linguistics, Volume 2

This paper identifies and examines the key principles underlying building a state-of-the-art grammatical error correction system. We do this by analyzing the Illinois system that placed first among seventeen teams in the recent CoNLL-2013 shared task on grammatical error correction. The system focuses on five different types of errors common among non-native English writers. We describe four design principles that are relevant for correcting all of these errors, analyze the system along these dimensions, and show how each of these dimensions contributes to the performance.

pdf bib
Large Scale Arabic Error Annotation: Guidelines and Framework
Wajdi Zaghouani | Behrang Mohit | Nizar Habash | Ossama Obeid | Nadi Tomeh | Alla Rozovskaya | Noura Farra | Sarah Alkuhlani | Kemal Oflazer
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

We present annotation guidelines and a web-based annotation framework developed as part of an effort to create a manually annotated Arabic corpus of errors and corrections for various text types. Such a corpus will be invaluable for developing Arabic error correction tools, both for training models and as a gold standard for evaluating error correction algorithms. We summarize the guidelines we created. We also describe issues encountered during the training of the annotators, as well as problems that are specific to the Arabic language that arose during the annotation process. Finally, we present the annotation tool that was developed as part of this project, the annotation pipeline, and the quality of the resulting annotations.

2013

pdf bib
Joint Learning and Inference for Grammatical Error Correction
Alla Rozovskaya | Dan Roth
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

pdf bib
The University of Illinois System in the CoNLL-2013 Shared Task
Alla Rozovskaya | Kai-Wei Chang | Mark Sammons | Dan Roth
Proceedings of the Seventeenth Conference on Computational Natural Language Learning: Shared Task

2012

pdf bib
The UI System in the HOO 2012 Shared Task on Error Correction
Alla Rozovskaya | Mark Sammons | Dan Roth
Proceedings of the Seventh Workshop on Building Educational Applications Using NLP

pdf bib
Illinois-Coref: The UI System in the CoNLL-2012 Shared Task
Kai-Wei Chang | Rajhans Samdani | Alla Rozovskaya | Mark Sammons | Dan Roth
Joint Conference on EMNLP and CoNLL - Shared Task

2011

pdf bib
Inference Protocols for Coreference Resolution
Kai-Wei Chang | Rajhans Samdani | Alla Rozovskaya | Nick Rizzolo | Mark Sammons | Dan Roth
Proceedings of the Fifteenth Conference on Computational Natural Language Learning: Shared Task

pdf bib
University of Illinois System in HOO Text Correction Shared Task
Alla Rozovskaya | Mark Sammons | Joshua Gioja | Dan Roth
Proceedings of the 13th European Workshop on Natural Language Generation

pdf bib
Algorithm Selection and Model Adaptation for ESL Correction Tasks
Alla Rozovskaya | Dan Roth
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
They Can Help: Using Crowdsourcing to Improve the Evaluation of Grammatical Error Detection Systems
Nitin Madnani | Martin Chodorow | Joel Tetreault | Alla Rozovskaya
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

2010

pdf bib
Training Paradigms for Correcting Errors in Grammar and Usage
Alla Rozovskaya | Dan Roth
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
Annotating ESL Errors: Challenges and Rewards
Alla Rozovskaya | Dan Roth
Proceedings of the NAACL HLT 2010 Fifth Workshop on Innovative Use of NLP for Building Educational Applications

pdf bib
Generating Confusion Sets for Context-Sensitive Error Correction
Alla Rozovskaya | Dan Roth
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

2009

pdf bib
Identifying Semantic Relations in Context: Near-misses and Overlaps
Alla Rozovskaya | Roxana Girju
Proceedings of the International Conference RANLP-2009

pdf bib
Using DEDICOM for Completely Unsupervised Part-of-Speech Tagging
Peter Chew | Brett Bader | Alla Rozovskaya
Proceedings of the Workshop on Unsupervised and Minimally Supervised Learning of Lexical Semantics

2007

pdf bib
Multilingual Word Sense Discrimination: A Comparative Cross-Linguistic Study
Alla Rozovskaya | Richard Sproat
Proceedings of the Workshop on Balto-Slavonic Natural Language Processing

pdf bib
UIUC: A Knowledge-rich Approach to Identifying Semantic Relations between Nominals
Brandon Beamer | Suma Bhat | Brant Chee | Andrew Fister | Alla Rozovskaya | Roxana Girju
Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007)