Jingfeng Yang


2020

pdf bib
Planning and Generating Natural and Diverse Disfluent Texts as Augmentation for Disfluency Detection
Jingfeng Yang | Diyi Yang | Zhaoran Ma
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Existing approaches to disfluency detection heavily depend on human-annotated data. Numbers of data augmentation methods have been proposed to alleviate the dependence on labeled data. However, current augmentation approaches such as random insertion or repetition fail to resemble training corpus well and usually resulted in unnatural and limited types of disfluencies. In this work, we propose a simple Planner-Generator based disfluency generation model to generate natural and diverse disfluent texts as augmented data, where the Planner decides on where to insert disfluent segments and the Generator follows the prediction to generate corresponding disfluent segments. We further utilize this augmented data for pretraining and leverage it for the task of disfluency detection. Experiments demonstrated that our two-stage disfluency generation model outperforms existing baselines; those disfluent sentences generated significantly aided the task of disfluency detection and led to state-of-the-art performance on Switchboard corpus.

2018

pdf bib
Toward Fast and Accurate Neural Discourse Segmentation
Yizhong Wang | Sujian Li | Jingfeng Yang
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Discourse segmentation, which segments texts into Elementary Discourse Units, is a fundamental step in discourse analysis. Previous discourse segmenters rely on complicated hand-crafted features and are not practical in actual use. In this paper, we propose an end-to-end neural segmenter based on BiLSTM-CRF framework. To improve its accuracy, we address the problem of data insufficiency by transferring a word representation model that is trained on a large corpus. We also propose a restricted self-attention mechanism in order to capture useful information within a neighborhood. Experiments on the RST-DT corpus show that our model is significantly faster than previous methods, while achieving new state-of-the-art performance.

2017

pdf bib
Tag-Enhanced Tree-Structured Neural Networks for Implicit Discourse Relation Classification
Yizhong Wang | Sujian Li | Jingfeng Yang | Xu Sun | Houfeng Wang
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Identifying implicit discourse relations between text spans is a challenging task because it requires understanding the meaning of the text. To tackle this task, recent studies have tried several deep learning methods but few of them exploited the syntactic information. In this work, we explore the idea of incorporating syntactic parse tree into neural networks. Specifically, we employ the Tree-LSTM model and Tree-GRU model, which is based on the tree structure, to encode the arguments in a relation. And we further leverage the constituent tags to control the semantic composition process in these tree-structured neural networks. Experimental results show that our method achieves state-of-the-art performance on PDTB corpus.