Huaiyu Zhu


2020

pdf bib
A Novel Workflow for Accurately and Efficiently Crowdsourcing Predicate Senses and Argument Labels
Youxuan Jiang | Huaiyu Zhu | Jonathan K. Kummerfeld | Yunyao Li | Walter Lasecki
Findings of the Association for Computational Linguistics: EMNLP 2020

Resources for Semantic Role Labeling (SRL) are typically annotated by experts at great expense. Prior attempts to develop crowdsourcing methods have either had low accuracy or required substantial expert annotation. We propose a new multi-stage crowd workflow that substantially reduces expert involvement without sacrificing accuracy. In particular, we introduce a unique filter stage based on the key observation that crowd workers are able to almost perfectly filter out incorrect options for labels. Our three-stage workflow produces annotations with 95% accuracy for predicate labels and 93% for argument labels, which is comparable to expert agreement. Compared to prior work on crowdsourcing for SRL, we decrease expert effort by 4x, from 56% to 14% of cases. Our approach enables more scalable annotation of SRL, and could enable annotation of NLP tasks that have previously been considered too complex to effectively crowdsource.

pdf bib
CLAR: A Cross-Lingual Argument Regularizer for Semantic Role Labeling
Ishan Jindal | Yunyao Li | Siddhartha Brahma | Huaiyu Zhu
Findings of the Association for Computational Linguistics: EMNLP 2020

Semantic role labeling (SRL) identifies predicate-argument structure(s) in a given sentence. Although different languages have different argument annotations, polyglot training, the idea of training one model on multiple languages, has previously been shown to outperform monolingual baselines, especially for low resource languages. In fact, even a simple combination of data has been shown to be effective with polyglot training by representing the distant vocabularies in a shared representation space. Meanwhile, despite the dissimilarity in argument annotations between languages, certain argument labels do share common semantic meaning across languages (e.g. adjuncts have more or less similar semantic meaning across languages). To leverage such similarity in annotation space across languages, we propose a method called Cross-Lingual Argument Regularizer (CLAR). CLAR identifies such linguistic annotation similarity across languages and exploits this information to map the target language arguments using a transformation of the space on which source language arguments lie. By doing so, our experimental results show that CLAR consistently improves SRL performance on multiple languages over monolingual and polyglot baselines for low resource languages.

pdf bib
Small but Mighty: New Benchmarks for Split and Rephrase
Li Zhang | Huaiyu Zhu | Siddhartha Brahma | Yunyao Li
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Split and Rephrase is a text simplification task of rewriting a complex sentence into simpler ones. As a relatively new task, it is paramount to ensure the soundness of its evaluation benchmark and metric. We find that the widely used benchmark dataset universally contains easily exploitable syntactic cues caused by its automatic generation process. Taking advantage of such cues, we show that even a simple rule-based model can perform on par with the state-of-the-art model. To remedy such limitations, we collect and release two crowdsourced benchmark datasets. We not only make sure that they contain significantly more diverse syntax, but also carefully control for their quality according to a well-defined set of criteria. While no satisfactory automatic metric exists, we apply fine-grained manual evaluation based on these criteria using crowdsourcing, showing that our datasets better represent the task and are significantly more challenging for the models.

2019

pdf bib
Towards Universal Semantic Representation
Huaiyu Zhu | Yunyao Li | Laura Chiticariu
Proceedings of the First International Workshop on Designing Meaning Representations

Natural language understanding at the semantic level and independent of language variations is of great practical value. Existing approaches such as semantic role labeling (SRL) and abstract meaning representation (AMR) still have features related to the peculiarities of the particular language. In this work we describe various challenges and possible solutions in designing a semantic representation that is universal across a variety of languages.

2018

pdf bib
SystemT: Declarative Text Understanding for Enterprise
Laura Chiticariu | Marina Danilevsky | Yunyao Li | Frederick Reiss | Huaiyu Zhu
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 3 (Industry Papers)

The rise of enterprise applications over unstructured and semi-structured documents poses new challenges to text understanding systems across multiple dimensions. We present SystemT, a declarative text understanding system that addresses these challenges and has been deployed in a wide range of enterprise applications. We highlight the design considerations and decisions behind SystemT in addressing the needs of the enterprise setting. We also summarize the impact of SystemT on business and education.

2016

pdf bib
Multilingual Information Extraction with PolyglotIE
Alan Akbik | Laura Chiticariu | Marina Danilevsky | Yonas Kbrom | Yunyao Li | Huaiyu Zhu
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations

We present PolyglotIE, a web-based tool for developing extractors that perform Information Extraction (IE) over multilingual data. Our tool has two core features: First, it allows users to develop extractors against a unified abstraction that is shared across a large set of natural languages. This means that an extractor needs only be created once for one language, but will then run on multilingual data without any additional effort or language-specific knowledge on part of the user. Second, it embeds this abstraction as a set of views within a declarative IE system, allowing users to quickly create extractors using a mature IE query language. We present PolyglotIE as a hands-on demo in which users can experiment with creating extractors, execute them on multilingual text and inspect extraction results. Using the UI, we discuss the challenges and potential of using unified, crosslingual semantic abstractions as basis for downstream applications. We demonstrate multilingual IE for 9 languages from 4 different language groups: English, German, French, Spanish, Japanese, Chinese, Arabic, Russian and Hindi.

2015

pdf bib
Generating High Quality Proposition Banks for Multilingual Semantic Role Labeling
Alan Akbik | Laura Chiticariu | Marina Danilevsky | Yunyao Li | Shivakumar Vaithyanathan | Huaiyu Zhu
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)