Hai-Long Trieu


2019

pdf bib
Coreference Resolution in Full Text Articles with BERT and Syntax-based Mention Filtering
Hai-Long Trieu | Anh-Khoa Duong Nguyen | Nhung Nguyen | Makoto Miwa | Hiroya Takamura | Sophia Ananiadou
Proceedings of The 5th Workshop on BioNLP Open Shared Tasks

This paper describes our system developed for the coreference resolution task of the CRAFT Shared Tasks 2019. The CRAFT corpus is more challenging than other existing corpora because it contains full text articles. We have employed an existing span-based state-of-theart neural coreference resolution system as a baseline system. We enhance the system with two different techniques to capture longdistance coreferent pairs. Firstly, we filter noisy mentions based on parse trees with increasing the number of antecedent candidates. Secondly, instead of relying on the LSTMs, we integrate the highly expressive language model–BERT into our model. Experimental results show that our proposed systems significantly outperform the baseline. The best performing system obtained F-scores of 44%, 48%, 39%, 49%, 40%, and 57% on the test set with B3, BLANC, CEAFE, CEAFM, LEA, and MUC metrics, respectively. Additionally, the proposed model is able to detect coreferent pairs in long distances, even with a distance of more than 200 sentences.

2018

pdf bib
Investigating Domain-Specific Information for Neural Coreference Resolution on Biomedical Texts
Hai-Long Trieu | Nhung T. H. Nguyen | Makoto Miwa | Sophia Ananiadou
Proceedings of the BioNLP 2018 workshop

Existing biomedical coreference resolution systems depend on features and/or rules based on syntactic parsers. In this paper, we investigate the utility of the state-of-the-art general domain neural coreference resolution system on biomedical texts. The system is an end-to-end system without depending on any syntactic parsers. We also investigate the domain specific features to enhance the system for biomedical texts. Experimental results on the BioNLP Protein Coreference dataset and the CRAFT corpus show that, with no parser information, the adapted system compared favorably with the systems that depend on parser information on these datasets, achieving 51.23% on the BioNLP dataset and 36.33% on the CRAFT corpus in F1 score. In-domain embeddings and domain-specific features helped improve the performance on the BioNLP dataset, but they did not on the CRAFT corpus.

2017

pdf bib
The JAIST Machine Translation Systems for WMT 17
Hai-Long Trieu | Trung-Tin Pham | Le-Minh Nguyen
Proceedings of the Second Conference on Machine Translation

2016

pdf bib
Dealing with Out-Of-Vocabulary Problem in Sentence Alignment Using Word Similarity
Hai-Long Trieu | Le-Minh Nguyen | Phuong-Thai Nguyen
Proceedings of the 30th Pacific Asia Conference on Language, Information and Computation: Oral Papers