Aiti Aw

Also published as: Ai Ti Aw, AiTi Aw


2020

pdf bib
GCDST: A Graph-based and Copy-augmented Multi-domain Dialogue State Tracking
Peng Wu | Bowei Zou | Ridong Jiang | AiTi Aw
Findings of the Association for Computational Linguistics: EMNLP 2020

As an essential component of task-oriented dialogue systems, Dialogue State Tracking (DST) takes charge of estimating user intentions and requests in dialogue contexts and extracting substantial goals (states) from user utterances to help the downstream modules to determine the next actions of dialogue systems. For practical usages, a major challenge to constructing a robust DST model is to process a conversation with multi-domain states. However, most existing approaches trained DST on a single domain independently, ignoring the information across domains. To tackle the multi-domain DST task, we first construct a dialogue state graph to transfer structured features among related domain-slot pairs across domains. Then, we encode the graph information of dialogue states by graph convolutional networks and utilize a hard copy mechanism to directly copy historical states from the previous conversation. Experimental results show that our model improves the performances of the multi-domain DST baseline (TRADE) with the absolute joint accuracy of 2.0% and 1.0% on the MultiWOZ 2.0 and 2.1 dialogue datasets, respectively.

pdf bib
NUT-RC: Noisy User-generated Text-oriented Reading Comprehension
Rongtao Huang | Bowei Zou | Yu Hong | Wei Zhang | AiTi Aw | Guodong Zhou
Proceedings of the 28th International Conference on Computational Linguistics

Reading comprehension (RC) on social media such as Twitter is a critical and challenging task due to its noisy, informal, but informative nature. Most existing RC models are developed on formal datasets such as news articles and Wikipedia documents, which severely limit their performances when directly applied to the noisy and informal texts in social media. Moreover, these models only focus on a certain type of RC, extractive or generative, but ignore the integration of them. To well address these challenges, we come up with a noisy user-generated text-oriented RC model. In particular, we first introduce a set of text normalizers to transform the noisy and informal texts to the formal ones. Then, we integrate the extractive and the generative RC model by a multi-task learning mechanism and an answer selection module. Experimental results on TweetQA demonstrate that our NUT-RC model significantly outperforms the state-of-the-art social media-oriented RC models.

pdf bib
Uncertainty Modeling for Machine Comprehension Systems using Efficient Bayesian Neural Networks
Zhengyuan Liu | Pavitra Krishnaswamy | Ai Ti Aw | Nancy Chen
Proceedings of the 28th International Conference on Computational Linguistics: Industry Track

While neural approaches have achieved significant improvement in machine comprehension tasks, models often work as a black-box, resulting in lower interpretability, which requires special attention in domains such as healthcare or education. Quantifying uncertainty helps pave the way towards more interpretable neural networks. In classification and regression tasks, Bayesian neural networks have been effective in estimating model uncertainty. However, inference time increases linearly due to the required sampling process in Bayesian neural networks. Thus speed becomes a bottleneck in tasks with high system complexity such as question-answering or dialogue generation. In this work, we propose a hybrid neural architecture to quantify model uncertainty using Bayesian weight approximation but boosts up the inference speed by 80% relative at test time, and apply it for a clinical dialogue comprehension task. The proposed approach is also used to enable active learning so that an updated model can be trained more optimally with new incoming data by selecting samples that are not well-represented in the current training scheme.

2019

pdf bib
Revisit Automatic Error Detection for Wrong and Missing Translation – A Supervised Approach
Wenqiang Lei | Weiwen Xu | Ai Ti Aw | Yuanxin Xiang | Tat Seng Chua
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

While achieving great fluency, current machine translation (MT) techniques are bottle-necked by adequacy issues. To have a closer study of these issues and accelerate model development, we propose automatic detecting adequacy errors in MT hypothesis for MT model evaluation. To do that, we annotate missing and wrong translations, the two most prevalent issues for current neural machine translation model, in 15000 Chinese-English translation pairs. We build a supervised alignment model for translation error detection (AlignDet) based on a simple Alignment Triangle strategy to set the benchmark for automatic error detection task. We also discuss the difficulties of this task and the benefits of this task for existing evaluation metrics.

pdf bib
Negative Focus Detection via Contextual Attention Mechanism
Longxiang Shen | Bowei Zou | Yu Hong | Guodong Zhou | Qiaoming Zhu | AiTi Aw
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Negation is a universal but complicated linguistic phenomenon, which has received considerable attention from the NLP community over the last decade, since a negated statement often carries both an explicit negative focus and implicit positive meanings. For the sake of understanding a negated statement, it is critical to precisely detect the negative focus in context. However, how to capture contextual information for negative focus detection is still an open challenge. To well address this, we come up with an attention-based neural network to model contextual information. In particular, we introduce a framework which consists of a Bidirectional Long Short-Term Memory (BiLSTM) neural network and a Conditional Random Fields (CRF) layer to effectively encode the order information and the long-range context dependency in a sentence. Moreover, we design two types of attention mechanisms, word-level contextual attention and topic-level contextual attention, to take advantage of contextual information across sentences from both the word perspective and the topic perspective, respectively. Experimental results on the SEM’12 shared task corpus show that our approach achieves the best performance on negative focus detection, yielding an absolute improvement of 2.11% over the state-of-the-art. This demonstrates the great effectiveness of the two types of contextual attention mechanisms.

pdf bib
Sentiment Aware Neural Machine Translation
Chenglei Si | Kui Wu | Ai Ti Aw | Min-Yen Kan
Proceedings of the 6th Workshop on Asian Translation

Sentiment ambiguous lexicons refer to words where their polarity depends strongly on con- text. As such, when the context is absent, their translations or their embedded sentence ends up (incorrectly) being dependent on the training data. While neural machine translation (NMT) has achieved great progress in recent years, most systems aim to produce one single correct translation for a given source sentence. We investigate the translation variation in two sentiment scenarios. We perform experiments to study the preservation of sentiment during translation with three different methods that we propose. We conducted tests with both sentiment and non-sentiment bearing contexts to examine the effectiveness of our methods. We show that NMT can generate both positive- and negative-valent translations of a source sentence, based on a given input sentiment label. Empirical evaluations show that our valence-sensitive embedding (VSE) method significantly outperforms a sequence-to-sequence (seq2seq) baseline, both in terms of BLEU score and ambiguous word translation accuracy in test, given non-sentiment bearing contexts.

2018

pdf bib
Named-Entity Tagging and Domain adaptation for Better Customized Translation
Zhongwei Li | Xuancong Wang | Ai Ti Aw | Eng Siong Chng | Haizhou Li
Proceedings of the Seventh Named Entities Workshop

Customized translation need pay spe-cial attention to the target domain ter-minology especially the named-entities for the domain. Adding linguistic features to neural machine translation (NMT) has been shown to benefit translation in many studies. In this paper, we further demonstrate that adding named-entity (NE) feature with named-entity recognition (NER) into the source language produces better translation with NMT. Our experiments show that by just including the different NE classes and boundary tags, we can increase the BLEU score by around 1 to 2 points using the standard test sets from WMT2017. We also show that adding NE tags using NER and applying in-domain adaptation can be combined to further improve customized machine translation.

2016

pdf bib
A Word Labeling Approach to Thai Sentence Boundary Detection and POS Tagging
Nina Zhou | AiTi Aw | Nattadaporn Lertcheva | Xuancong Wang
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Previous studies on Thai Sentence Boundary Detection (SBD) mostly assumed sentence ends at a space disambiguation problem, which classified space either as an indicator for Sentence Boundary (SB) or non-Sentence Boundary (nSB). In this paper, we propose a word labeling approach which treats space as a normal word, and detects SB between any two words. This removes the restriction for SB to be oc-curred only at space and makes our system more robust for modern Thai writing. It is because in modern Thai writing, space is not consistently used to indicate SB. As syntactic information contributes to better SBD, we further propose a joint Part-Of-Speech (POS) tagging and SBD framework based on Factorial Conditional Random Field (FCRF) model. We compare the performance of our proposed ap-proach with reported methods on ORCHID corpus. We also performed experiments of FCRF model on the TaLAPi corpus. The results show that the word labelling approach has better performance than pre-vious space-based classification approaches and FCRF joint model outperforms LCRF model in terms of SBD in all experiments.

2015

pdf bib
Toward Tweets Normalization Using Maximum Entropy
Mohammad Arshi Saloot | Norisma Idris | Liyana Shuib | Ram Gopal Raj | AiTi Aw
Proceedings of the Workshop on Noisy User-generated Text

2014

pdf bib
A Rule-Augmented Statistical Phrase-based Translation System
Cong Duy Vu Hoang | AiTi Aw | Nhung T. H. Nguyen
Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations

pdf bib
TaLAPi — A Thai Linguistically Annotated Corpus for Language Processing
AiTi Aw | Sharifah Mahani Aljunied | Nattadaporn Lertcheva | Sasiwimon Kalunsima
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper discusses a Thai corpus, TaLAPi, fully annotated with word segmentation (WS), part-of-speech (POS) and named entity (NE) information with the aim to provide a high-quality and sufficiently large corpus for real-life implementation of Thai language processing tools. The corpus contains 2,720 articles (1,043,471words) from the entertainment and lifestyle (NE&L) domain and 5,489 articles (3,181,487 words) in the news (NEWS) domain, with a total of 35 POS tags and 10 named entity categories. In particular, we present an approach to segment and tag foreign and loan words expressed in transliterated or original form in Thai text corpora. We see this as an area for study as adapted and un-adapted foreign language sequences have not been well addressed in the literature and this poses a challenge to the annotation process due to the increasing use and adoption of foreign words in the Thai language nowadays. To reduce the ambiguities in POS tagging and to provide rich information for facilitating Thai syntactic analysis, we adapted the POS tags used in ORCHID and propose a framework to tag Thai text and also addresses the tagging of loan and foreign words based on the proposed segmentation strategy. TaLAPi also includes a detailed guideline for tagging the 10 named entity categories

2012

pdf bib
Personalized Normalization for a Multilingual Chat System
Ai Ti Aw | Lian Hau Lee
Proceedings of the ACL 2012 System Demonstrations

pdf bib
An Unsupervised and Data-Driven Approach for Spell Checking in Vietnamese OCR-scanned Texts
Cong Duy Vu Hoang | Ai Ti Aw
Proceedings of the Workshop on Innovative Hybrid Approaches to the Processing of Textual Data

2010

pdf bib
Linguistically Annotated Reordering: Evaluation and Analysis
Deyi Xiong | Min Zhang | Aiti Aw | Haizhou Li
Computational Linguistics, Volume 36, Issue 3 - September 2010

pdf bib
EM-based Hybrid Model for Bilingual Terminology Extraction from Comparable Corpora
Lianhau Lee | Aiti Aw | Min Zhang | Haizhou Li
Coling 2010: Posters

2009

pdf bib
Forest-based Tree Sequence to String Translation Model
Hui Zhang | Min Zhang | Haizhou Li | Aiti Aw | Chew Lim Tan
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

pdf bib
A Syntax-Driven Bracketing Model for Phrase-Based Translation
Deyi Xiong | Min Zhang | Aiti Aw | Haizhou Li
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

pdf bib
A Comparative Study of Hypothesis Alignment and its Improvement for Machine Translation System Combination
Boxing Chen | Min Zhang | Haizhou Li | Aiti Aw
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

pdf bib
MARS: Multilingual Access and Retrieval System with Enhanced Query Translation and Document Retrieval
Lianhau Lee | Aiti Aw | Thuy Vu | Sharifah Aljunied Mahani | Min Zhang | Haizhou Li
Proceedings of the ACL-IJCNLP 2009 Software Demonstrations

pdf bib
Feature-Based Method for Document Alignment in Comparable News Corpora
Thuy Vu | Ai Ti Aw | Min Zhang
Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009)

2008

pdf bib
A Tree Sequence Alignment-based Tree-to-Tree Translation Model
Min Zhang | Hongfei Jiang | Aiti Aw | Haizhou Li | Chew Lim Tan | Sheng Li
Proceedings of ACL-08: HLT

pdf bib
A Linguistically Annotated Reordering Model for BTG-based Statistical Machine Translation
Deyi Xiong | Min Zhang | Aiti Aw | Haizhou Li
Proceedings of ACL-08: HLT, Short Papers

pdf bib
Exploiting N-best Hypotheses for SMT Self-Enhancement
Boxing Chen | Min Zhang | Aiti Aw | Haizhou Li
Proceedings of ACL-08: HLT, Short Papers

pdf bib
Name Origin Recognition Using Maximum Entropy Model and Diverse Features
Min Zhang | Chengjie Sun | Haizhou Li | AiTi Aw | Chew Lim Tan | Xiaolong Wang
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-I

pdf bib
Refinements in BTG-based Statistical Machine Translation
Deyi Xiong | Min Zhang | AiTi Aw | Haitao Mi | Qun Liu | Shouxun Lin
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-I

pdf bib
Term Extraction Through Unithood and Termhood Unification
Thuy Vu | Ai Ti Aw | Min Zhang
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-II

pdf bib
Fast Computing Grammar-driven Convolution Tree Kernel for Semantic Role Labeling
Wanxiang Che | Min Zhang | Ai Ti Aw | Chew Lim Tan | Ting Liu | Sheng Li
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-II

pdf bib
Regenerating Hypotheses for Statistical Machine Translation
Boxing Chen | Min Zhang | Aiti Aw | Haizhou Li
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

pdf bib
Linguistically Annotated BTG for Statistical Machine Translation
Deyi Xiong | Min Zhang | Aiti Aw | Haizhou Li
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

pdf bib
Grammar Comparison Study for Translational Equivalence Modeling and Statistical Machine Translation
Min Zhang | Hongfei Jiang | Haizhou Li | Aiti Aw | Sheng Li
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

2007

pdf bib
A Grammar-driven Convolution Tree Kernel for Semantic Role Classification
Min Zhang | Wanxiang Che | Aiti Aw | Chew Lim Tan | Guodong Zhou | Ting Liu | Sheng Li
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics

2006

pdf bib
A Phrase-Based Statistical Model for SMS Text Normalization
AiTi Aw | Min Zhang | Juan Xiao | Jian Su
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions