Lijie Wang


2020

pdf bib
LIT Team’s System Description for Japanese-Chinese Machine Translation Task in IWSLT 2020
Yimeng Zhuang | Yuan Zhang | Lijie Wang
Proceedings of the 17th International Conference on Spoken Language Translation

This paper describes the LIT Team’s submission to the IWSLT2020 open domain translation task, focusing primarily on Japanese-to-Chinese translation direction. Our system is based on the organizers’ baseline system, but we do more works on improving the Transform baseline system by elaborate data pre-processing. We manage to obtain significant improvements, and this paper aims to share some data processing experiences in this translation task. Large-scale back-translation on monolingual corpus is also investigated. In addition, we also try shared and exclusive word embeddings, compare different granularity of tokens like sub-word level. Our Japanese-to-Chinese translation system achieves a performance of BLEU=34.0 and ranks 2nd among all participating systems.

pdf bib
DuSQL: A Large-Scale and Pragmatic Chinese Text-to-SQL Dataset
Lijie Wang | Ao Zhang | Kun Wu | Ke Sun | Zhenghua Li | Hua Wu | Min Zhang | Haifeng Wang
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Due to the lack of labeled data, previous research on text-to-SQL parsing mainly focuses on English. Representative English datasets include ATIS, WikiSQL, Spider, etc. This paper presents DuSQL, a larges-scale and pragmatic Chinese dataset for the cross-domain text-to-SQL task, containing 200 databases, 813 tables, and 23,797 question/SQL pairs. Our new dataset has three major characteristics. First, by manually analyzing questions from several representative applications, we try to figure out the true distribution of SQL queries in real-life needs. Second, DuSQL contains a considerable proportion of SQL queries involving row or column calculations, motivated by our analysis on the SQL query distributions. Finally, we adopt an effective data construction framework via human-computer collaboration. The basic idea is automatically generating SQL queries based on the SQL grammar and constrained by the given database. This paper describes in detail the construction process and data statistics of DuSQL. Moreover, we present and compare performance of several open-source text-to-SQL parsers with minor modification to accommodate Chinese, including a simple yet effective extension to IRNet for handling calculation SQL queries.