Minh Le Nguyen

Also published as: Le-Minh Nguyen, Le Minh Nguyen, Nguyen Le Minh, Nguyen Le Minh, Minh-Le Nguyen, M.L Nguyen


2020

pdf bib
Answering Legal Questions by Learning Neural Attentive Text Representation
Phi Manh Kien | Ha-Thanh Nguyen | Ngo Xuan Bach | Vu Tran | Minh Le Nguyen | Tu Minh Phuong
Proceedings of the 28th International Conference on Computational Linguistics

Text representation plays a vital role in retrieval-based question answering, especially in the legal domain where documents are usually long and complicated. The better the question and the legal documents are represented, the more accurate they are matched. In this paper, we focus on the task of answering legal questions at the article level. Given a legal question, the goal is to retrieve all the correct and valid legal articles, that can be used as the basic to answer the question. We present a retrieval-based model for the task by learning neural attentive text representation. Our text representation method first leverages convolutional neural networks to extract important information in a question and legal articles. Attention mechanisms are then used to represent the question and articles and select appropriate information to align them in a matching process. Experimental results on an annotated corpus consisting of 5,922 Vietnamese legal questions show that our model outperforms state-of-the-art retrieval-based methods for question answering by large margins in terms of both recall and NDCG.

2019

pdf bib
Overcoming the Rare Word Problem for low-resource language pairs in Neural Machine Translation
Thi-Vinh Ngo | Thanh-Le Ha | Phuong-Thai Nguyen | Le-Minh Nguyen
Proceedings of the 6th Workshop on Asian Translation

Among the six challenges of neural machine translation (NMT) coined by (Koehn and Knowles, 2017), rare-word problem is considered the most severe one, especially in translation of low-resource languages. In this paper, we propose three solutions to address the rare words in neural machine translation systems. First, we enhance source context to predict the target words by connecting directly the source embeddings to the output of the attention component in NMT. Second, we propose an algorithm to learn morphology of unknown words for English in supervised way in order to minimize the adverse effect of rare-word problem. Finally, we exploit synonymous relation from the WordNet to overcome out-of-vocabulary (OOV) problem of NMT. We evaluate our approaches on two low-resource language pairs: English-Vietnamese and Japanese-Vietnamese. In our experiments, we have achieved significant improvements of up to roughly +1.0 BLEU points in both language pairs.

2018

pdf bib
TSix: A Human-involved-creation Dataset for Tweet Summarization
Minh-Tien Nguyen | Dac Viet Lai | Huy-Tien Nguyen | Le-Minh Nguyen
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Dual Latent Variable Model for Low-Resource Natural Language Generation in Dialogue Systems
Van-Khanh Tran | Le-Minh Nguyen
Proceedings of the 22nd Conference on Computational Natural Language Learning

Recent deep learning models have shown improving results to natural language generation (NLG) irrespective of providing sufficient annotated data. However, a modest training data may harm such models’ performance. Thus, how to build a generator that can utilize as much of knowledge from a low-resource setting data is a crucial issue in NLG. This paper presents a variational neural-based generation model to tackle the NLG problem of having limited labeled dataset, in which we integrate a variational inference into an encoder-decoder generator and introduce a novel auxiliary auto-encoding with an effective training procedure. Experiments showed that the proposed methods not only outperform the previous models when having sufficient training dataset but also demonstrate strong ability to work acceptably well when the training data is scarce.

pdf bib
Adversarial Domain Adaptation for Variational Neural Language Generation in Dialogue Systems
Van-Khanh Tran | Le-Minh Nguyen
Proceedings of the 27th International Conference on Computational Linguistics

Domain Adaptation arises when we aim at learning from source domain a model that can perform acceptably well on a different target domain. It is especially crucial for Natural Language Generation (NLG) in Spoken Dialogue Systems when there are sufficient annotated data in the source domain, but there is a limited labeled data in the target domain. How to effectively utilize as much of existing abilities from source domains is a crucial issue in domain adaptation. In this paper, we propose an adversarial training procedure to train a Variational encoder-decoder based language generator via multiple adaptation steps. In this procedure, a model is first trained on a source domain data and then fine-tuned on a small set of target domain utterances under the guidance of two proposed critics. Experimental results show that the proposed method can effectively leverage the existing knowledge in the source domain to adapt to another related domain by using only a small amount of in-domain data.

pdf bib
Effectiveness of Character Language Model for Vietnamese Named Entity Recognition
Xuan-Dung Doan | Trung-Thanh Dang | Le-Minh Nguyen
Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation

2017

pdf bib
Natural Language Generation for Spoken Dialogue System using RNN Encoder-Decoder Networks
Van-Khanh Tran | Le-Minh Nguyen
Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)

Natural language generation (NLG) is a critical component in a spoken dialogue system. This paper presents a Recurrent Neural Network based Encoder-Decoder architecture, in which an LSTM-based decoder is introduced to select, aggregate semantic elements produced by an attention mechanism over the input elements, and to produce the required utterances. The proposed generator can be jointly trained both sentence planning and surface realization to produce natural language sentences. The proposed model was extensively evaluated on four different NLG datasets. The experimental results showed that the proposed generators not only consistently outperform the previous methods across all the NLG domains but also show an ability to generalize from a new, unseen domain and learn from multi-domain datasets.

pdf bib
Building Lexical Vector Representations from Concept Definitions
Danilo Silva de Carvalho | Minh Le Nguyen
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

The use of distributional language representations have opened new paths in solving a variety of NLP problems. However, alternative approaches can take advantage of information unavailable through pure statistical means. This paper presents a method for building vector representations from meaning unit blocks called concept definitions, which are obtained by extracting information from a curated linguistic resource (Wiktionary). The representations obtained in this way can be compared through conventional cosine similarity and are also interpretable by humans. Evaluation was conducted in semantic similarity and relatedness test sets, with results indicating a performance comparable to other methods based on single linguistic resource extraction. The results also indicate noticeable performance gains when combining distributional similarity scores with the ones obtained using this approach. Additionally, a discussion on the proposed method’s shortcomings is provided in the analysis of error cases.

pdf bib
Sentence Modeling with Deep Neural Architecture using Lexicon and Character Attention Mechanism for Sentiment Classification
Huy Thanh Nguyen | Minh Le Nguyen
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Tweet-level sentiment classification in Twitter social networking has many challenges: exploiting syntax, semantic, sentiment, and context in tweets. To address these problems, we propose a novel approach to sentiment analysis that uses lexicon features for building lexicon embeddings (LexW2Vs) and generates character attention vectors (CharAVs) by using a Deep Convolutional Neural Network (DeepCNN). Our approach integrates LexW2Vs and CharAVs with continuous word embeddings (ContinuousW2Vs) and dependency-based word embeddings (DependencyW2Vs) simultaneously in order to increase information for each word into a Bidirectional Contextual Gated Recurrent Neural Network (Bi-CGRNN). We evaluate our model on two Twitter sentiment classification datasets. Experimental results show that our model can improve the classification accuracy of sentence-level sentiment analysis in Twitter social networking.

pdf bib
An Ensemble Method with Sentiment Features and Clustering Support
Huy Tien Nguyen | Minh Le Nguyen
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Deep learning models have recently been applied successfully in natural language processing, especially sentiment analysis. Each deep learning model has a particular advantage, but it is difficult to combine these advantages into one model, especially in the area of sentiment analysis. In our approach, Convolutional Neural Network (CNN) and Long Short Term Memory (LSTM) were utilized to learn sentiment-specific features in a freezing scheme. This scenario provides a novel and efficient way for integrating advantages of deep learning models. In addition, we also grouped documents into clusters by their similarity and applied the prediction score of Naive Bayes SVM (NBSVM) method to boost the classification accuracy of each group. The experiments show that our method achieves the state-of-the-art performance on two well-known datasets: IMDB large movie reviews for document level and Pang & Lee movie reviews for sentence level.

pdf bib
Investigating Phrase-Based and Neural-Based Machine Translation on Low-Resource Settings
Hai Long Trieu | Duc-Vu Tran | Le Minh Nguyen
Proceedings of the 31st Pacific Asia Conference on Language, Information and Computation

pdf bib
The JAIST Machine Translation Systems for WMT 17
Hai-Long Trieu | Trung-Tin Pham | Le-Minh Nguyen
Proceedings of the Second Conference on Machine Translation

pdf bib
Neural-based Natural Language Generation in Dialogue using RNN Encoder-Decoder with Semantic Aggregation
Van-Khanh Tran | Le-Minh Nguyen | Satoshi Tojo
Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue

Natural language generation (NLG) is an important component in spoken dialogue systems. This paper presents a model called Encoder-Aggregator-Decoder which is an extension of an Recurrent Neural Network based Encoder-Decoder architecture. The proposed Semantic Aggregator consists of two components: an Aligner and a Refiner. The Aligner is a conventional attention calculated over the encoded input information, while the Refiner is another attention or gating mechanism stacked over the attentive Aligner in order to further select and aggregate the semantic elements. The proposed model can be jointly trained both sentence planning and surface realization to produce natural language utterances. The model was extensively assessed on four different NLG domains, in which the experimental results showed that the proposed generator consistently outperforms the previous methods on all the NLG domains.

2016

pdf bib
VSoLSCSum: Building a Vietnamese Sentence-Comment Dataset for Social Context Summarization
Minh-Tien Nguyen | Dac Viet Lai | Phong-Khac Do | Duc-Vu Tran | Minh-Le Nguyen
Proceedings of the 12th Workshop on Asian Language Resources (ALR12)

This paper presents VSoLSCSum, a Vietnamese linked sentence-comment dataset, which was manually created to treat the lack of standard corpora for social context summarization in Vietnamese. The dataset was collected through the keywords of 141 Web documents in 12 special events, which were mentioned on Vietnamese Web pages. Social users were asked to involve in creating standard summaries and the label of each sentence or comment. The inter-agreement calculated by Cohen’s Kappa among raters after validating is 0.685. To illustrate the potential use of our dataset, a learning to rank method was trained by using a set of local and social features. Experimental results indicate that the summary model trained on our dataset outperforms state-of-the-art baselines in both ROUGE-1 and ROUGE-2 in social context summarization.

pdf bib
Dealing with Out-Of-Vocabulary Problem in Sentence Alignment Using Word Similarity
Hai-Long Trieu | Le-Minh Nguyen | Phuong-Thai Nguyen
Proceedings of the 30th Pacific Asia Conference on Language, Information and Computation: Oral Papers

2015

pdf bib
JAIST: A two-phase machine learning approach for identifying discourse relations in newswire texts
Truong Son Nguyen | Bao Quoc Ho | Le Minh Nguyen
Proceedings of the Nineteenth Conference on Computational Natural Language Learning - Shared Task

pdf bib
Semi-supervised Learning for Vietnamese Named Entity Recognition using Online Conditional Random Fields
Quang Hong Pham | Minh-Le Nguyen | Binh Thanh Nguyen | Nguyen Viet Cuong
Proceedings of the Fifth Named Entity Workshop

pdf bib
JAIST: Combining multiple features for Answer Selection in Community Question Answering
Quan Hung Tran | Vu Duc Tran | Tu Thanh Vu | Minh Le Nguyen | Son Bao Pham
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

2013

pdf bib
Learning Based Approaches for Vietnamese Question Classification Using Keywords Extraction from the Web
Dang Hai Tran | Cuong Xuan Chu | Son Bao Pham | Minh Le Nguyen
Proceedings of the Sixth International Joint Conference on Natural Language Processing

pdf bib
Using Shallow Semantic Parsing and Relation Extraction for Finding Contradiction in Text
Minh Quang Nhat Pham | Minh Le Nguyen | Akira Shimazu
Proceedings of the Sixth International Joint Conference on Natural Language Processing

pdf bib
Bootstrapping Phrase-based Statistical Machine Translation via WSD Integration
Hien Vu Huy | Phuong-Thai Nguyen | Tung-Lam Nguyen | M.L Nguyen
Proceedings of the Sixth International Joint Conference on Natural Language Processing

2012

pdf bib
A Reranking Model for Discourse Segmentation using Subtree Features
Ngo Xuan Bach | Nguyen Le Minh | Akira Shimazu
Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue

2011

pdf bib
Supervised and Semi-Supervised Sequence Learning for Recognition of Requisite Part and Effectuation Part in Law Sentences
Le-Minh Nguyen | Ngo Xuan Bach | Akira Shimazu
Proceedings of the 9th International Workshop on Finite State Methods and Natural Language Processing

pdf bib
Learning Logical Structures of Paragraphs in Legal Articles
Ngo Xuan Bach | Nguyen Le Minh | Tran Thi Oanh | Akira Shimazu
Proceedings of 5th International Joint Conference on Natural Language Processing

pdf bib
A Listwise Approach to Coreference Resolution in Multiple Languages
Oanh Thi Tran | Bach Xuan Ngo | Minh Le Nguyen | Akira Shimazu
Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation

2009

pdf bib
A Semi-supervised Approach for Generating a Table-of-Contents
Viet Cuong Nguyen | Le Minh Nguyen | Akira Shimazu
Proceedings of the International Conference RANLP-2009

pdf bib
An Empirical Study of Vietnamese Noun Phrase Chunking with Discriminative Sequence Models
Le Minh Nguyen | Huong Thao Nguyen | Phuong Thai Nguyen | Tu Bao Ho | Akira Shimazu
Proceedings of the 7th Workshop on Asian Language Resources (ALR7)

2008

pdf bib
A Tree-to-String Phrase-based Model for Statistical Machine Translation
Thai Phuong Nguyen | Akira Shimazu | Tu-Bao Ho | Minh Le Nguyen | Vinh Van Nguyen
CoNLL 2008: Proceedings of the Twelfth Conference on Computational Natural Language Learning

2007

pdf bib
A Multilingual Dependency Analysis System Using Online Passive-Aggressive Learning
Le-Minh Nguyen | Akira Shimazu | Phuong-Thai Nguyen | Xuan-Hieu Phan
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

2006

pdf bib
Semantic Parsing with Structured SVM Ensemble Classification Models
Le-Minh Nguyen | Akira Shimazu | Xuan-Hieu Phan
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions

pdf bib
Vietnamese Word Segmentation with CRFs and SVMs: An Investigation
Cam-Tu Nguyen | Trung-Kien Nguyen | Xuan-Hieu Phan | Le-Minh Nguyen | Quang-Thuy Ha
Proceedings of the 20th Pacific Asia Conference on Language, Information and Computation

2005

pdf bib
A Structured SVM Semantic Parser Augmented by Semantic Tagging with Conditional Random Field
Minh Le Nguyen | Akira Shimazu | Hieu Xuan Phan
Proceedings of the 19th Pacific Asia Conference on Language, Information and Computation

2004

pdf bib
Probabilistic Sentence Reduction Using Support Vector Machines
Minh Le Nguyen | Akira Shimazu | Susumu Horiguchi | Bao Tu Ho | Masuru Fukushi
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

2003

pdf bib
A Sentence Reduction using Syntax Control
Minh Le Nguyen | Susumu Horiguchi
Proceedings of the Sixth International Workshop on Information Retrieval with Asian Languages

pdf bib
Translation Template Learning Based on Hidden Markov Modeling
Minh Le Nguyen | Akari Shimazu | Susumu Horiguchi
Proceedings of the 17th Pacific Asia Conference on Language, Information and Computation

pdf bib
A New Sentence Reduction based on Decision Tree Model
Minh Le Nguyen | Susumu Horiguchi
Proceedings of the 17th Pacific Asia Conference on Language, Information and Computation