Chinese and English Elementary Discourse Units Segmentation based on Bi-LSTM-CRF Model

Li Yancui, Lai Chunxiao, Feng Jike, Feng Hongyu


Abstract
Elementary Discourse Unit (EDU) recognition is the basic task of discourse analysis, and the Chinese and English discourse alignment corpus is helpful to the studies of EDU recognition. This paper first builds Chinese-English parallel discourse corpus, in which EDUs are annotated and aligned. Then, we present the framework of Bi-LSTM-CRF EDUs recognition model using word embedding, POS and syntactic features, which can combine the advantage of CRF and Bi-LSTM. The results show that F1 is about 2% higher than the traditional method. Compared with CRF and Bi-LSTM, the Bi-LSTM-CRF model can combine the advantages of them and obtains satisfactory results for Chinese and English EDUs recognition. The experiment of feature contribution shows that using all features together can get best result, the syntactic feature outperforms than other features.
Anthology ID:
2020.ccl-1.99
Volume:
Proceedings of the 19th Chinese National Conference on Computational Linguistics
Month:
October
Year:
2020
Address:
Haikou, China
Venue:
CCL
SIG:
Publisher:
Chinese Information Processing Society of China
Note:
Pages:
1068–1078
Language:
English
URL:
https://www.aclweb.org/anthology/2020.ccl-1.99
DOI:
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/2020.ccl-1.99.pdf