Mai Hoang Dao


2020

pdf bib
A Pilot Study of Text-to-SQL Semantic Parsing for Vietnamese
Anh Tuan Nguyen | Mai Hoang Dao | Dat Quoc Nguyen
Findings of the Association for Computational Linguistics: EMNLP 2020

Semantic parsing is an important NLP task. However, Vietnamese is a low-resource language in this research area. In this paper, we present the first public large-scale Text-to-SQL semantic parsing dataset for Vietnamese. We extend and evaluate two strong semantic parsing baselines EditSQL (Zhang et al., 2019) and IRNet (Guo et al., 2019) on our dataset. We compare the two baselines with key configurations and find that: automatic Vietnamese word segmentation improves the parsing results of both baselines; the normalized pointwise mutual information (NPMI) score (Bouma, 2009) is useful for schema linking; latent syntactic features extracted from a neural dependency parser for Vietnamese also improve the results; and the monolingual language model PhoBERT for Vietnamese (Nguyen and Nguyen, 2020) helps produce higher performances than the recent best multilingual language model XLM-R (Conneau et al., 2020).

pdf bib
WNUT-2020 Task 2: Identification of Informative COVID-19 English Tweets
Dat Quoc Nguyen | Thanh Vu | Afshin Rahimi | Mai Hoang Dao | Linh The Nguyen | Long Doan
Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020)

In this paper, we provide an overview of the WNUT-2020 shared task on the identification of informative COVID-19 English Tweets. We describe how we construct a corpus of 10K Tweets and organize the development and evaluation phases for this task. In addition, we also present a brief summary of results obtained from the final system evaluation submissions of 55 teams, finding that (i) many systems obtain very high performance, up to 0.91 F1 score, (ii) the majority of the submissions achieve substantially higher results than the baseline fastText (Joulin et al., 2017), and (iii) fine-tuning pre-trained language models on relevant language data followed by supervised training performs well in this task.