Nianwen Xue


2020

pdf bib
Proceedings of the CoNLL 2020 Shared Task: Cross-Framework Meaning Representation Parsing
Stephan Oepen | Omri Abend | Lasha Abzianidze | Johan Bos | Jan Hajič | Daniel Hershcovich | Bin Li | Tim O'Gorman | Nianwen Xue | Daniel Zeman
Proceedings of the CoNLL 2020 Shared Task: Cross-Framework Meaning Representation Parsing

pdf bib
MRP 2020: The Second Shared Task on Cross-Framework and Cross-Lingual Meaning Representation Parsing
Stephan Oepen | Omri Abend | Lasha Abzianidze | Johan Bos | Jan Hajic | Daniel Hershcovich | Bin Li | Tim O’Gorman | Nianwen Xue | Daniel Zeman
Proceedings of the CoNLL 2020 Shared Task: Cross-Framework Meaning Representation Parsing

The 2020 Shared Task at the Conference for Computational Language Learning (CoNLL) was devoted to Meaning Representation Parsing (MRP) across frameworks and languages. Extending a similar setup from the previous year, five distinct approaches to the representation of sentence meaning in the form of directed graphs were represented in the English training and evaluation data for the task, packaged in a uniform graph abstraction and serialization; for four of these representation frameworks, additional training and evaluation data was provided for one additional language per framework. The task received submissions from eight teams, of which two do not participate in the official ranking because they arrived after the closing deadline or made use of additional training data. All technical information regarding the task, including system submissions, official results, and links to supporting resources and software are available from the task web site at: http://mrp.nlpl.eu

pdf bib
Transformer-GCRF: Recovering Chinese Dropped Pronouns with General Conditional Random Fields
Jingxuan Yang | Kerui Xu | Jun Xu | Si Li | Sheng Gao | Jun Guo | Ji-Rong Wen | Nianwen Xue
Findings of the Association for Computational Linguistics: EMNLP 2020

Pronouns are often dropped in Chinese conversations and recovering the dropped pronouns is important for NLP applications such as Machine Translation. Existing approaches usually formulate this as a sequence labeling task of predicting whether there is a dropped pronoun before each token and its type. Each utterance is considered to be a sequence and labeled independently. Although these approaches have shown promise, labeling each utterance independently ignores the dependencies between pronouns in neighboring utterances. Modeling these dependencies is critical to improving the performance of dropped pronoun recovery. In this paper, we present a novel framework that combines the strength of Transformer network with General Conditional Random Fields (GCRF) to model the dependencies between pronouns in neighboring utterances. Results on three Chinese conversation datasets show that the Transformer-GCRF model outperforms the state-of-the-art dropped pronoun recovery models. Exploratory analysis also demonstrates that the GCRF did help to capture the dependencies between pronouns in neighboring utterances, thus contributes to the performance improvements.

pdf bib
Proceedings of the Second International Workshop on Designing Meaning Representations
Nianwen Xue | Johan Bos | William Croft | Jan Hajič | Chu-Ren Huang | Stephan Oepen | Martha Palmer | James Pustejovsky
Proceedings of the Second International Workshop on Designing Meaning Representations

pdf bib
Annotating Temporal Dependency Graphs via Crowdsourcing
Jiarui Yao | Haoling Qiu | Bonan Min | Nianwen Xue
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

We present the construction of a corpus of 500 Wikinews articles annotated with temporal dependency graphs (TDGs) that can be used to train systems to understand temporal relations in text. We argue that temporal dependency graphs, built on previous research on narrative times and temporal anaphora, provide a representation scheme that achieves a good trade-off between completeness and practicality in temporal annotation. We also provide a crowdsourcing strategy to annotate TDGs, and demonstrate the feasibility of this approach with an evaluation of the quality of the annotation, and the utility of the resulting data set by training a machine learning model on this data set. The data set is publicly available.

2019

pdf bib
Proceedings of the Shared Task on Cross-Framework Meaning Representation Parsing at the 2019 Conference on Natural Language Learning
Stephan Oepen | Omri Abend | Jan Hajic | Daniel Hershcovich | Marco Kuhlmann | Tim O’Gorman | Nianwen Xue
Proceedings of the Shared Task on Cross-Framework Meaning Representation Parsing at the 2019 Conference on Natural Language Learning

pdf bib
MRP 2019: Cross-Framework Meaning Representation Parsing
Stephan Oepen | Omri Abend | Jan Hajic | Daniel Hershcovich | Marco Kuhlmann | Tim O’Gorman | Nianwen Xue | Jayeol Chun | Milan Straka | Zdenka Uresova
Proceedings of the Shared Task on Cross-Framework Meaning Representation Parsing at the 2019 Conference on Natural Language Learning

The 2019 Shared Task at the Conference for Computational Language Learning (CoNLL) was devoted to Meaning Representation Parsing (MRP) across frameworks. Five distinct approaches to the representation of sentence meaning in the form of directed graph were represented in the training and evaluation data for the task, packaged in a uniform abstract graph representation and serialization. The task received submissions from eighteen teams, of which five do not participate in the official ranking because they arrived after the closing deadline, made use of additional training data, or involved one of the task co-organizers. All technical information regarding the task, including system submissions, official results, and links to supporting resources and software are available from the task web site at: http://mrp.nlpl.eu

pdf bib
Proceedings of the First International Workshop on Designing Meaning Representations
Nianwen Xue | William Croft | Jan Hajic | Chu-Ren Huang | Stephan Oepen | Martha Palmer | James Pustejovksy
Proceedings of the First International Workshop on Designing Meaning Representations

pdf bib
Modeling Quantification and Scope in Abstract Meaning Representations
James Pustejovsky | Ken Lai | Nianwen Xue
Proceedings of the First International Workshop on Designing Meaning Representations

In this paper, we propose an extension to Abstract Meaning Representations (AMRs) to encode scope information of quantifiers and negation, in a way that overcomes the semantic gaps of the schema while maintaining its cognitive simplicity. Specifically, we address three phenomena not previously part of the AMR specification: quantification, negation (generally), and modality. The resulting representation, which we call “Uniform Meaning Representation” (UMR), adopts the predicative core of AMR and embeds it under a “scope” graph when appropriate. UMR representations differ from other treatments of quantification and modal scope phenomena in two ways: (a) they are more transparent; and (b) they specify default scope when possible.‘

pdf bib
Parsing Meaning Representations: Is Easier Always Better?
Zi Lin | Nianwen Xue
Proceedings of the First International Workshop on Designing Meaning Representations

The parsing accuracy varies a great deal for different meaning representations. In this paper, we compare the parsing performances between Abstract Meaning Representation (AMR) and Minimal Recursion Semantics (MRS), and provide an in-depth analysis of what factors contributed to the discrepancy in their parsing accuracy. By crystalizing the trade-off between representation expressiveness and ease of automatic parsing, we hope our results can help inform the design of the next-generation meaning representations.

pdf bib
Building a Chinese AMR Bank with Concept and Relation Alignments
Bin Li | Yuan Wen | Li Song | Weiguang Qu | Nianwen Xue
Linguistic Issues in Language Technology, Volume 18, 2019 - Exploiting Parsed Corpora: Applications in Research, Pedagogy, and Processing

Abstract Meaning Representation (AMR) is a meaning representation framework in which the meaning of a full sentence is represented as a single-rooted, acyclic, directed graph. In this article, we describe an on-going project to build a Chinese AMR (CAMR) corpus, which currently includes 10,149 sentences from the newsgroup and weblog portion of the Chinese TreeBank (CTB). We describe the annotation specifications for the CAMR corpus, which follow the annotation principles of English AMR but make adaptations where needed to accommodate the linguistic facts of Chinese. The CAMR specifications also include a systematic treatment of sentence-internal discourse relations. One significant change we have made to the AMR annotation methodology is the inclusion of the alignment between word tokens in the sentence and the concepts/relations in the CAMR annotation to make it easier for automatic parsers to model the correspondence between a sentence and its meaning representation. We develop an annotation tool for CAMR, and the inter-agreement as measured by the Smatch score between the two annotators is 0.83, indicating reliable annotation. We also present some quantitative analysis of the CAMR corpus. 46.71% of the AMRs of the sentences are non-tree graphs. Moreover, the AMR of 88.95% of the sentences has concepts inferred from the context of the sentence but do not correspond to a specific word.

pdf bib
Recovering dropped pronouns in Chinese conversations via modeling their referents
Jingxuan Yang | Jianzhuo Tong | Si Li | Sheng Gao | Jun Guo | Nianwen Xue
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Pronouns are often dropped in Chinese sentences, and this happens more frequently in conversational genres as their referents can be easily understood from context. Recovering dropped pronouns is essential to applications such as Information Extraction where the referents of these dropped pronouns need to be resolved, or Machine Translation when Chinese is the source language. In this work, we present a novel end-to-end neural network model to recover dropped pronouns in conversational data. Our model is based on a structured attention mechanism that models the referents of dropped pronouns utilizing both sentence-level and word-level information. Results on three different conversational genres show that our approach achieves a significant improvement over the current state of the art.

pdf bib
Acquiring Structured Temporal Representation via Crowdsourcing: A Feasibility Study
Yuchen Zhang | Nianwen Xue
Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*SEM 2019)

Temporal Dependency Trees are a structured temporal representation that represents temporal relations among time expressions and events in a text as a dependency tree structure. Compared to traditional pair-wise temporal relation representations, temporal dependency trees facilitate efficient annotations, higher inter-annotator agreement, and efficient computations. However, annotations on temporal dependency trees so far have only been done by expert annotators, which is costly and time-consuming. In this paper, we introduce a method to crowdsource temporal dependency tree annotations, and show that this representation is intuitive and can be collected with high accuracy and agreement through crowdsourcing. We produce a corpus of temporal dependency trees, and present a baseline temporal dependency parser, trained and evaluated on this new corpus.

2018

pdf bib
Structured Interpretation of Temporal Relations
Yuchen Zhang | Nianwen Xue
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Transition-Based Chinese AMR Parsing
Chuan Wang | Bin Li | Nianwen Xue
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

This paper presents the first AMR parser built on the Chinese AMR bank. By applying a transition-based AMR parsing framework to Chinese, we first investigate how well the transitions first designed for English AMR parsing generalize to Chinese and provide a comparative analysis between the transitions for English and Chinese. We then perform a detailed error analysis to identify the major challenges in Chinese AMR parsing that we hope will inform future research in this area.

pdf bib
Neural Ranking Models for Temporal Dependency Structure Parsing
Yuchen Zhang | Nianwen Xue
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

We design and build the first neural temporal dependency parser. It utilizes a neural ranking model with minimal feature engineering, and parses time expressions and events in a text into a temporal dependency tree structure. We evaluate our parser on two domains: news reports and narrative stories. In a parsing-only evaluation setup where gold time expressions and events are provided, our parser reaches 0.81 and 0.70 f-score on unlabeled and labeled parsing respectively, a result that is very competitive against alternative approaches. In an end-to-end evaluation setup where time expressions and events are automatically recognized, our parser beats two strong baselines on both data domains. Our experimental results and discussions shed light on the nature of temporal dependency structures in different domains and provide insights that we believe will be valuable to future research in this area.

2017

pdf bib
A Systematic Study of Neural Discourse Models for Implicit Discourse Relation
Attapol Rutherford | Vera Demberg | Nianwen Xue
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

Inferring implicit discourse relations in natural language text is the most difficult subtask in discourse parsing. Many neural network models have been proposed to tackle this problem. However, the comparison for this task is not unified, so we could hardly draw clear conclusions about the effectiveness of various architectures. Here, we propose neural network models that are based on feedforward and long-short term memory architecture and systematically study the effects of varying structures. To our surprise, the best-configured feedforward architecture outperforms LSTM-based model in most cases despite thorough tuning. Further, we compare our best feedforward system with competitive convolutional and recurrent networks and find that feedforward can actually be more effective. For the first time for this task, we compile and publish outputs from previous neural and non-neural systems to establish the standard for further comparison.

pdf bib
Addressing the Data Sparsity Issue in Neural AMR Parsing
Xiaochang Peng | Chuan Wang | Daniel Gildea | Nianwen Xue
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

Neural attention models have achieved great success in different NLP tasks. However, they have not fulfilled their promise on the AMR parsing task due to the data sparsity issue. In this paper, we describe a sequence-to-sequence model for AMR parsing and present different ways to tackle the data sparsity problem. We show that our methods achieve significant improvement over a baseline neural attention model and our results are also competitive against state-of-the-art systems that do not use extra linguistic resources.

pdf bib
Proceedings of the IJCNLP 2017, Shared Tasks
Chao-Hong Liu | Preslav Nakov | Nianwen Xue
Proceedings of the IJCNLP 2017, Shared Tasks

pdf bib
Translation Divergences in Chinese–English Machine Translation: An Empirical Investigation
Dun Deng | Nianwen Xue
Computational Linguistics, Volume 43, Issue 3 - September 2017

In this article, we conduct an empirical investigation of translation divergences between Chinese and English relying on a parallel treebank. To do this, we first devise a hierarchical alignment scheme where Chinese and English parse trees are aligned in a way that eliminates conflicts and redundancies between word alignments and syntactic parses to prevent the generation of spurious translation divergences. Using this Hierarchically Aligned Chinese–English Parallel Treebank (HACEPT), we are able to semi-automatically identify and categorize the translation divergences between the two languages and quantify each type of translation divergence. Our results show that the translation divergences are much broader than described in previous studies that are largely based on anecdotal evidence and linguistic knowledge. The distribution of the translation divergences also shows that some high-profile translation divergences that motivate previous research are actually very rare in our data, whereas other translation divergences that have previously received little attention actually exist in large quantities. We also show that HACEPT allows the extraction of syntax-based translation rules, most of which are expressive enough to capture the translation divergences, and point out that the syntactic annotation in existing treebanks is not optimal for extracting such translation rules. We also discuss the implications of our study for attempts to bridge translation divergences by devising shared semantic representations across languages. Our quantitative results lend further support to the observation that although it is possible to bridge some translation divergences with semantic representations, other translation divergences are open-ended, thus building a semantic representation that captures all possible translation divergences may be impractical.

pdf bib
Proceedings of the 11th Linguistic Annotation Workshop
Nathan Schneider | Nianwen Xue
Proceedings of the 11th Linguistic Annotation Workshop

pdf bib
Discourse Segmentation for Building a RST Chinese Treebank
Shuyuan Cao | Nianwen Xue | Iria da Cunha | Mikel Iruskieta | Chuan Wang
Proceedings of the 6th Workshop on Recent Advances in RST and Related Formalisms

pdf bib
Getting the Most out of AMR Parsing
Chuan Wang | Nianwen Xue
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

This paper proposes to tackle the AMR parsing bottleneck by improving two components of an AMR parser: concept identification and alignment. We first build a Bidirectional LSTM based concept identifier that is able to incorporate richer contextual information to learn sparse AMR concept labels. We then extend an HMM-based word-to-concept alignment model with graph distance distortion and a rescoring method during decoding to incorporate the structural information in the AMR graph. We show integrating the two components into an existing AMR parser results in consistently better performance over the state of the art on various datasets.

2016

pdf bib
Large Multi-lingual, Multi-level and Multi-genre Annotation Corpus
Xuansong Li | Martha Palmer | Nianwen Xue | Lance Ramshaw | Mohamed Maamouri | Ann Bies | Kathryn Conger | Stephen Grimes | Stephanie Strassel
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

High accuracy for automated translation and information retrieval calls for linguistic annotations at various language levels. The plethora of informal internet content sparked the demand for porting state-of-art natural language processing (NLP) applications to new social media as well as diverse language adaptation. Effort launched by the BOLT (Broad Operational Language Translation) program at DARPA (Defense Advanced Research Projects Agency) successfully addressed the internet information with enhanced NLP systems. BOLT aims for automated translation and linguistic analysis for informal genres of text and speech in online and in-person communication. As a part of this program, the Linguistic Data Consortium (LDC) developed valuable linguistic resources in support of the training and evaluation of such new technologies. This paper focuses on methodologies, infrastructure, and procedure for developing linguistic annotation at various language levels, including Treebank (TB), word alignment (WA), PropBank (PB), and co-reference (CoRef). Inspired by the OntoNotes approach with adaptations to the tasks to reflect the goals and scope of the BOLT project, this effort has introduced more annotation types of informal and free-style genres in English, Chinese and Egyptian Arabic. The corpus produced is by far the largest multi-lingual, multi-level and multi-genre annotation corpus of informal text and speech.

pdf bib
Proceedings of the CoNLL-16 shared task
Nianwen Xue
Proceedings of the CoNLL-16 shared task

pdf bib
CoNLL 2016 Shared Task on Multilingual Shallow Discourse Parsing
Nianwen Xue | Hwee Tou Ng | Sameer Pradhan | Attapol Rutherford | Bonnie Webber | Chuan Wang | Hongmin Wang
Proceedings of the CoNLL-16 shared task

pdf bib
Robust Non-Explicit Neural Discourse Parser in English and Chinese
Attapol Rutherford | Nianwen Xue
Proceedings of the CoNLL-16 shared task

pdf bib
Annotating the Little Prince with Chinese AMRs
Bin Li | Yuan Wen | Weiguang Qu | Lijun Bu | Nianwen Xue
Proceedings of the 10th Linguistic Annotation Workshop held in conjunction with ACL 2016 (LAW-X 2016)

pdf bib
Converting SynTagRus Dependency Treebank into Penn Treebank Style
Alex Luu | Sophia A. Malamud | Nianwen Xue
Proceedings of the 10th Linguistic Annotation Workshop held in conjunction with ACL 2016 (LAW-X 2016)

pdf bib
Annotating the discourse and dialogue structure of SMS message conversations
Nianwen Xue | Qishen Su | Sooyoung Jeong
Proceedings of the 10th Linguistic Annotation Workshop held in conjunction with ACL 2016 (LAW-X 2016)

pdf bib
CAMR at SemEval-2016 Task 8: An Extended Transition-based AMR Parser
Chuan Wang | Sameer Pradhan | Xiaoman Pan | Heng Ji | Nianwen Xue
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

2015

pdf bib
The CoNLL-2015 Shared Task on Shallow Discourse Parsing
Nianwen Xue | Hwee Tou Ng | Sameer Pradhan | Rashmi Prasad | Christopher Bryant | Attapol Rutherford
Proceedings of the Nineteenth Conference on Computational Natural Language Learning - Shared Task

pdf bib
Harmonizing word alignments and syntactic structures for extracting phrasal translation equivalents
Dun Deng | Nianwen Xue | Shiman Guo
Proceedings of the Ninth Workshop on Syntax, Semantics and Structure in Statistical Translation

pdf bib
A Transition-based Algorithm for AMR Parsing
Chuan Wang | Nianwen Xue | Sameer Pradhan
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Improving the Inference of Implicit Discourse Relations via Classifying Explicit Discourse Connectives
Attapol Rutherford | Nianwen Xue
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Feature Optimization for Constituent Parsing via Neural Networks
Zhiguo Wang | Haitao Mi | Nianwen Xue
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

pdf bib
Recovering dropped pronouns from Chinese text messages
Yaqin Yang | Yalin Liu | Nianwen Xue
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

pdf bib
Boosting Transition-based AMR Parsing with Refined Actions and Auxiliary Analyzers
Chuan Wang | Nianwen Xue | Sameer Pradhan
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

2014

pdf bib
Joint POS Tagging and Transition-based Constituent Parsing in Chinese with Non-local Features
Zhiguo Wang | Nianwen Xue
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Effective Document-Level Features for Chinese Patent Word Segmentation
Si Li | Nianwen Xue
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Discovering Implicit Discourse Relations Through Brown Cluster Pair Representation and Coreference Patterns
Attapol Rutherford | Nianwen Xue
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
Aligning Chinese-English Parallel Parse Trees: Is it Feasible?
Dun Deng | Nianwen Xue
Proceedings of LAW VIII - The 8th Linguistic Annotation Workshop

pdf bib
Building a Hierarchically Aligned Chinese-English Parallel Treebank
Dun Deng | Nianwen Xue
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

pdf bib
Automatic Inference of the Tense of Chinese Events Using Implicit Linguistic Information
Yuchen Zhang | Nianwen Xue
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf bib
Buy one get one free: Distant annotation of Chinese tense, event type and modality
Nianwen Xue | Yuchen Zhang
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

We describe a “distant annotation” method where we mark up the semantic tense, event type, and modality of Chinese events via a word-aligned parallel corpus. We first map Chinese verbs to their English counterparts via word alignment, and then annotate the resulting English text spans with coarse-grained categories for semantic tense, event type, and modality that we believe apply to both English and Chinese. Because English has richer morpho-syntactic indicators for semantic tense, event type and modality than Chinese, our intuition is that this distant annotation approach will yield more consistent annotation than if we annotate the Chinese side directly. We report experimental results that show stable annotation agreement statistics and that event type and modality have significant influence on tense prediction. We also report the size of the annotated corpus that we have obtained, and how different domains impact annotation consistency.

pdf bib
Not an Interlingua, But Close: Comparison of English AMRs to Chinese and Czech
Nianwen Xue | Ondřej Bojar | Jan Hajič | Martha Palmer | Zdeňka Urešová | Xiuhong Zhang
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Abstract Meaning Representations (AMRs) are rooted, directional and labeled graphs that abstract away from morpho-syntactic idiosyncrasies such as word category (verbs and nouns), word order, and function words (determiners, some prepositions). Because these syntactic idiosyncrasies account for many of the cross-lingual differences, it would be interesting to see if this representation can serve, e.g., as a useful, minimally divergent transfer layer in machine translation. To answer this question, we have translated 100 English sentences that have existing AMRs into Chinese and Czech to create AMRs for them. A cross-linguistic comparison of English to Chinese and Czech AMRs reveals both cases where the AMRs for the language pairs align well structurally and cases of linguistic divergence. We found that the level of compatibility of AMR between English and Chinese is higher than between English and Czech. We believe this kind of comparison is beneficial to further refining the annotation standards for each of the three languages and will lead to more compatible annotation guidelines between the languages.

2013

pdf bib
Distant annotation of Chinese tense and modality
Nianwen Xue | Yuchen Zhang | Yaqin Yang
Proceedings of the IWCS 2013 Workshop on Annotation of Modal Meanings in Natural Language (WAMM)

pdf bib
Towards Robust Linguistic Analysis using OntoNotes
Sameer Pradhan | Alessandro Moschitti | Nianwen Xue | Hwee Tou Ng | Anders Björkelund | Olga Uryupina | Yuchen Zhang | Zhi Zhong
Proceedings of the Seventeenth Conference on Computational Natural Language Learning

pdf bib
A Lattice-based Framework for Joint Chinese Word Segmentation, POS Tagging and Parsing
Zhiguo Wang | Chengqing Zong | Nianwen Xue
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Dependency-based empty category detection via phrase structure trees
Nianwen Xue | Yaqin Yang
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2012

pdf bib
PDTB-style Discourse Annotation of Chinese Text
Yuping Zhou | Nianwen Xue
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Chinese Comma Disambiguation for Discourse Analysis
Yaqin Yang | Nianwen Xue
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Parallel Aligned Treebanks at LDC: New Challenges Interfacing Existing Infrastructures
Xuansong Li | Stephanie Strassel | Stephen Grimes | Safa Ismael | Mohamed Maamouri | Ann Bies | Nianwen Xue
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Parallel aligned treebanks (PAT) are linguistic corpora annotated with morphological and syntactic structures that are aligned at sentence as well as sub-sentence levels. They are valuable resources for improving machine translation (MT) quality. Recently, there has been an increasing demand for such data, especially for divergent language pairs. The Linguistic Data Consortium (LDC) and its academic partners have been developing Arabic-English and Chinese-English PATs for several years. This paper describes the PAT corpus creation effort for the program GALE (Global Autonomous Language Exploitation) and introduces the potential issues of scaling up this PAT effort for the program BOLT (Broad Operational Language Translation). Based on existing infrastructures and in the light of current annotation process, challenges and approaches, we are exploring new methodologies to address emerging challenges in constructing PATs, including data volume bottlenecks, dialect issues of Arabic languages, and new genre features related to rapidly changing social media. Preliminary experimental results are presented to show the feasibility of the approaches proposed.

pdf bib
Annotating dropped pronouns in Chinese newswire text
Elizabeth Baran | Yaqin Yang | Nianwen Xue
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

We propose an annotation framework to explicitly identify dropped subject pronouns in Chinese. We acknowledge and specify 10 concrete pronouns that exist as words in Chinese and 4 abstract pronouns that do not correspond to Chinese words, but that are recognized conceptually, to native Chinese speakers. These abstract pronouns are identified as """"unspecified"""", """"pleonastic"""", """"event"""", and """"existential"""" and are argued to exist cross-linguistically. We trained two annotators, fluent in Chinese, and adjudicated their annotations to form a gold standard. We achieved an inter-annotator agreement kappa of .6 and an observed agreement of .7. We found that annotators had the most difficulty with the abstract pronouns, such as """"unspecified"""" and """"event"""", but we posit that further specification and training has the potential to significantly improve these results. We believe that this annotated data will serve to help improve Machine Translation models that translate from Chinese to a non pro-drop language, like English, that requires all subject pronouns to be explicit.

pdf bib
Exploring Temporal Vagueness with Mechanical Turk
Yuping Zhou | Nianwen Xue
Proceedings of the Sixth Linguistic Annotation Workshop

pdf bib
Joint Conference on EMNLP and CoNLL - Shared Task
Sameer Pradhan | Alessandro Moschitti | Nianwen Xue
Joint Conference on EMNLP and CoNLL - Shared Task

pdf bib
CoNLL-2012 Shared Task: Modeling Multilingual Unrestricted Coreference in OntoNotes
Sameer Pradhan | Alessandro Moschitti | Nianwen Xue | Olga Uryupina | Yuchen Zhang
Joint Conference on EMNLP and CoNLL - Shared Task

pdf bib
Building a Chinese Lexical Taxonomy
Xiaopeng Bai | Nianwen Xue
Proceedings of the Second CIPS-SIGHAN Joint Conference on Chinese Language Processing

pdf bib
Extending and Scaling up the Chinese Treebank Annotation
Xiuhong Zhang | Nianwen Xue
Proceedings of the Second CIPS-SIGHAN Joint Conference on Chinese Language Processing

2011

pdf bib
Discourse-constrained Temporal Annotation
Yuping Zhou | Nianwen Xue
Proceedings of the 5th Linguistic Annotation Workshop

pdf bib
Improving MT Word Alignment Using Aligned Multi-Stage Parses
Adam Meyers | Michiko Kosaka | Shasha Liao | Nianwen Xue
Proceedings of Fifth Workshop on Syntax, Semantics and Structure in Statistical Translation

pdf bib
CoNLL-2011 Shared Task: Modeling Unrestricted Coreference in OntoNotes
Sameer Pradhan | Lance Ramshaw | Mitchell Marcus | Martha Palmer | Ralph Weischedel | Nianwen Xue
Proceedings of the Fifteenth Conference on Computational Natural Language Learning: Shared Task

pdf bib
A Machine Learning-Based Coreference Detection System for OntoNotes
Yaqin Yang | Nianwen Xue | Peter Anick
Proceedings of the Fifteenth Conference on Computational Natural Language Learning: Shared Task

pdf bib
Chinese sentence segmentation as comma classification
Nianwen Xue | Yaqin Yang
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

2010

pdf bib
Proceedings of the Fourth Linguistic Annotation Workshop
Nianwen Xue | Massimo Poesio
Proceedings of the Fourth Linguistic Annotation Workshop

pdf bib
PropBank Annotation of Multilingual Light Verb Constructions
Jena D. Hwang | Archna Bhatia | Claire Bonial | Aous Mansouri | Ashwini Vaidya | Nianwen Xue | Martha Palmer
Proceedings of the Fourth Linguistic Annotation Workshop

pdf bib
Applying Syntactic, Semantic and Discourse Constraints in Chinese Temporal Annotation
Nianwen Xue | Yuping Zhou
Coling 2010: Posters

pdf bib
Chasing the ghost: recovering empty categories in the Chinese Treebank
Yaqin Yang | Nianwen Xue
Coling 2010: Posters

2009

pdf bib
The CoNLL-2009 Shared Task: Syntactic and Semantic Dependencies in Multiple Languages
Jan Hajič | Massimiliano Ciaramita | Richard Johansson | Daisuke Kawahara | Maria Antònia Martí | Lluís Màrquez | Adam Meyers | Joakim Nivre | Sebastian Padó | Jan Štěpánek | Pavel Straňák | Mihai Surdeanu | Nianwen Xue | Yi Zhang
Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL 2009): Shared Task

pdf bib
Automatic Recognition of Logical Relations for English, Chinese and Japanese in the GLARF Framework
Adam Meyers | Michiko Kosaka | Nianwen Xue | Heng Ji | Ang Sun | Shasha Liao | Wei Xu
Proceedings of the Workshop on Semantic Evaluations: Recent Achievements and Future Directions (SEW-2009)

pdf bib
Transducing Logical Relations from Automatic and Manual GLARF
Adam Meyers | Michiko Kosaka | Heng Ji | Nianwen Xue | Mary Harper | Ang Sun | Wei Xu | Shasha Liao
Proceedings of the Third Linguistic Annotation Workshop (LAW III)

pdf bib
Using Parallel Propbanks to enhance Word-alignments
Jinho Choi | Martha Palmer | Nianwen Xue
Proceedings of the Third Linguistic Annotation Workshop (LAW III)

pdf bib
OntoNotes: The 90% Solution
Sameer S. Pradhan | Nianwen Xue
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Tutorial Abstracts

2008

pdf bib
Annotating “tense” in a Tense-less Language
Nianwen Xue | Hua Zhong | Kai-Yun Chen
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

In the context of Natural Language Processing, annotation is about recovering implicit information that is useful for natural language applications. In this paper we describe a “tense” annotation task for Chinese - a language that does not have grammatical tense - that is designed to infer the temporal location of a situation in relation to the temporal deixis, the moment of speech. If successful, this would be a highly rewarding endeavor as it has application in many natural language systems. Our preliminary experiments show that while this is a very challenging annotation task for which high annotation consistency is very difficult but not impossible to achieve. We show that guidelines that provide a conceptually intuitive framework will be crucial to the success of this annotation effort.

pdf bib
Labeling Chinese Predicates with Semantic Roles
Nianwen Xue
Computational Linguistics, Volume 34, Number 2, June 2008 - Special Issue on Semantic Role Labeling

pdf bib
Coling 2008: Proceedings of the workshop on Cross-Framework and Cross-Domain Parser Evaluation
Johan Bos | Edward Briscoe | Aoife Cahill | John Carroll | Stephen Clark | Ann Copestake | Dan Flickinger | Josef van Genabith | Julia Hockenmaier | Aravind Joshi | Ronald Kaplan | Tracy Holloway King | Sandra Kuebler | Dekang Lin | Jan Tore Lønning | Christopher Manning | Yusuke Miyao | Joakim Nivre | Stephan Oepen | Kenji Sagae | Nianwen Xue | Yi Zhang
Coling 2008: Proceedings of the workshop on Cross-Framework and Cross-Domain Parser Evaluation

pdf bib
Automatic Inference of the Temporal Location of Situations in Chinese Text
Nianwen Xue
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing

2006

pdf bib
Aligning Features with Sense Distinction Dimensions
Nianwen Xue | Jinying Chen | Martha Palmer
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions

pdf bib
Annotating the Predicate-Argument Structure of Chinese Nominalizations
Nianwen Xue
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

This paper describes the Chinese NomBank Project, the goal of which is to annotate the predicate-argument structure of nominalized predicates in Chinese. The Chinese Nombank extends the general framework of the English and Chinese Proposition Banks to the annotation of nominalized predicates and adds a layer of semantic annotation to the Chinese Treebank. We first outline the scope of the work by discussing the markability of the nominalized predicates and their arguments. We then attempt to provide a categorization of the distribution of the arguments of nominalized predicates. We also discuss the relevance of the event/result distinction to the annotation of nominalized predicates and the phenomenon of incorporation. Finally we discuss some cross-linguistic differences between English and Chinese.

pdf bib
Semantic role labeling of nominalized predicates in Chinese
Nianwen Xue
Proceedings of the Human Language Technology Conference of the NAACL, Main Conference

2005

pdf bib
A Parallel Proposition Bank II for Chinese and English
Martha Palmer | Nianwen Xue | Olga Babko-Malaya | Jinying Chen | Benjamin Snyder
Proceedings of the Workshop on Frontiers in Corpus Annotations II: Pie in the Sky

pdf bib
Annotating Discourse Connectives in the Chinese Treebank
Nianwen Xue
Proceedings of the Workshop on Frontiers in Corpus Annotations II: Pie in the Sky

2004

pdf bib
Proposition Bank II: Delving Deeper
Olga Babko-Malaya | Martha Palmer | Nianwen Xue | Aravind Joshi | Seth Kulick
Proceedings of the Workshop Frontiers in Corpus Annotation at HLT-NAACL 2004

pdf bib
Calibrating Features for Semantic Role Labeling
Nianwen Xue | Martha Palmer
Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing

2003

pdf bib
Annotating the Propositions in the Penn Chinese Treebank
Nianwen Xue | Martha Palmer
Proceedings of the Second SIGHAN Workshop on Chinese Language Processing

pdf bib
Chinese Word Segmentation as LMR Tagging
Nianwen Xue | Libin Shen
Proceedings of the Second SIGHAN Workshop on Chinese Language Processing

pdf bib
Chinese Word Segmentation as Character Tagging
Nianwen Xue
International Journal of Computational Linguistics & Chinese Language Processing, Volume 8, Number 1, February 2003: Special Issue on Word Formation and Chinese Language Processing

2002

pdf bib
Combining Classifiers for Chinese Word Segmentation
Nianwen Xue | Susan P. Converse
COLING-02: The First SIGHAN Workshop on Chinese Language Processing

pdf bib
Building a Large-Scale Annotated Chinese Corpus
Nianwen Xue | Fu-Dong Chiou | Martha Palmer
COLING 2002: The 19th International Conference on Computational Linguistics

2000

pdf bib
Developing Guidelines and Ensuring Consistency for Chinese Text Annotation
Fei Xia | Martha Palmer | Nianwen Xue | Mary Ellen Okurowski | John Kovarik | Fu-Dong Chiou | Shizhe Huang | Tony Kroch | Mitch Marcus
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

Search
Co-authors