Kazuma Hashimoto


2020

bib
Building Salesforce Neural Machine Translation System
Kazuma Hashimoto | Raffaella Buschiazzo | Caiming Xiong | Teresa Marxhall
Proceedings of the 14th Conference of the Association for Machine Translation in the Americas (Volume 2: User Track)

pdf bib
Discriminative Nearest Neighbor Few-Shot Intent Detection by Transferring Natural Language Inference
Jianguo Zhang | Kazuma Hashimoto | Wenhao Liu | Chien-Sheng Wu | Yao Wan | Philip Yu | Richard Socher | Caiming Xiong
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Intent detection is one of the core components of goal-oriented dialog systems, and detecting out-of-scope (OOS) intents is also a practically important skill. Few-shot learning is attracting much attention to mitigate data scarcity, but OOS detection becomes even more challenging. In this paper, we present a simple yet effective approach, discriminative nearest neighbor classification with deep self-attention. Unlike softmax classifiers, we leverage BERT-style pairwise encoding to train a binary classifier that estimates the best matched training example for a user input. We propose to boost the discriminative ability by transferring a natural language inference (NLI) model. Our extensive experiments on a large-scale multi-domain intent detection task show that our method achieves more stable and accurate in-domain and OOS detection accuracy than RoBERTa-based classifiers and embedding-based nearest neighbor approaches. More notably, the NLI transfer enables our 10-shot model to perform competitively with 50-shot or even full-shot classifiers, while we can keep the inference time constant by leveraging a faster embedding retrieval model.

pdf bib
Simple Data Augmentation with the Mask Token Improves Domain Adaptation for Dialog Act Tagging
Semih Yavuz | Kazuma Hashimoto | Wenhao Liu | Nitish Shirish Keskar | Richard Socher | Caiming Xiong
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

The concept of Dialogue Act (DA) is universal across different task-oriented dialogue domains - the act of “request” carries the same speaker intention whether it is for restaurant reservation or flight booking. However, DA taggers trained on one domain do not generalize well to other domains, which leaves us with the expensive need for a large amount of annotated data in the target domain. In this work, we investigate how to better adapt DA taggers to desired target domains with only unlabeled data. We propose MaskAugment, a controllable mechanism that augments text input by leveraging the pre-trained Mask token from BERT model. Inspired by consistency regularization, we use MaskAugment to introduce an unsupervised teacher-student learning scheme to examine the domain adaptation of DA taggers. Our extensive experiments on the Simulated Dialogue (GSim) and Schema-Guided Dialogue (SGD) datasets show that MaskAugment is useful in improving the cross-domain generalization for DA tagging.

pdf bib
Find or Classify? Dual Strategy for Slot-Value Predictions on Multi-Domain Dialog State Tracking
Jianguo Zhang | Kazuma Hashimoto | Chien-Sheng Wu | Yao Wang | Philip Yu | Richard Socher | Caiming Xiong
Proceedings of the Ninth Joint Conference on Lexical and Computational Semantics

Dialog state tracking (DST) is a core component in task-oriented dialog systems. Existing approaches for DST mainly fall into one of two categories, namely, ontology-based and ontology-free methods. An ontology-based method selects a value from a candidate-value list for each target slot, while an ontology-free method extracts spans from dialog contexts. Recent work introduced a BERT-based model to strike a balance between the two methods by pre-defining categorical and non-categorical slots. However, it is not clear enough which slots are better handled by either of the two slot types, and the way to use the pre-trained model has not been well investigated. In this paper, we propose a simple yet effective dual-strategy model for DST, by adapting a single BERT-style reading comprehension model to jointly handle both the categorical and non-categorical slots. Our experiments on the MultiWOZ datasets show that our method significantly outperforms the BERT-based counterpart, finding that the key is a deep interaction between the domain-slot and context information. When evaluated on noisy (MultiWOZ 2.0) and cleaner (MultiWOZ 2.1) settings, our method performs competitively and robustly across the two different settings. Our method sets the new state of the art in the noisy setting, while performing more robustly than the best model in the cleaner setting. We also conduct a comprehensive error analysis on the dataset, including the effects of the dual strategy for each slot, to facilitate future research.

2019

pdf bib
A High-Quality Multilingual Dataset for Structured Documentation Translation
Kazuma Hashimoto | Raffaella Buschiazzo | James Bradbury | Teresa Marshall | Richard Socher | Caiming Xiong
Proceedings of the Fourth Conference on Machine Translation (Volume 1: Research Papers)

This paper presents a high-quality multilingual dataset for the documentation domain to advance research on localization of structured text. Unlike widely-used datasets for translation of plain text, we collect XML-structured parallel text segments from the online documentation for an enterprise software platform. These Web pages have been professionally translated from English into 16 languages and maintained by domain experts, and around 100,000 text segments are available for each language pair. We build and evaluate translation models for seven target languages from English, with several different copy mechanisms and an XML-constrained beam search. We also experiment with a non-English pair to show that our dataset has the potential to explicitly enable 17 × 16 translation settings. Our experiments show that learning to translate with the XML tags improves translation accuracy, and the beam search accurately generates XML structures. We also discuss trade-offs of using the copy mechanisms by focusing on translation of numerical words and named entities. We further provide a detailed human analysis of gaps between the model output and human translations for real-world applications, including suitability for post-editing.

pdf bib
Incorporating Source-Side Phrase Structures into Neural Machine Translation
Akiko Eriguchi | Kazuma Hashimoto | Yoshimasa Tsuruoka
Computational Linguistics, Volume 45, Issue 2 - June 2019

Neural machine translation (NMT) has shown great success as a new alternative to the traditional Statistical Machine Translation model in multiple languages. Early NMT models are based on sequence-to-sequence learning that encodes a sequence of source words into a vector space and generates another sequence of target words from the vector. In those NMT models, sentences are simply treated as sequences of words without any internal structure. In this article, we focus on the role of the syntactic structure of source sentences and propose a novel end-to-end syntactic NMT model, which we call a tree-to-sequence NMT model, extending a sequence-to-sequence model with the source-side phrase structure. Our proposed model has an attention mechanism that enables the decoder to generate a translated word while softly aligning it with phrases as well as words of the source sentence. We have empirically compared the proposed model with sequence-to-sequence models in various settings on Chinese-to-Japanese and English-to-Japanese translation tasks. Our experimental results suggest that the use of syntactic structure can be beneficial when the training data set is small, but is not as effective as using a bi-directional encoder. As the size of training data set increases, the benefits of using a syntactic tree tends to diminish.

pdf bib
Accelerated Reinforcement Learning for Sentence Generation by Vocabulary Prediction
Kazuma Hashimoto | Yoshimasa Tsuruoka
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

A major obstacle in reinforcement learning-based sentence generation is the large action space whose size is equal to the vocabulary size of the target-side language. To improve the efficiency of reinforcement learning, we present a novel approach for reducing the action space based on dynamic vocabulary prediction. Our method first predicts a fixed-size small vocabulary for each input to generate its target sentence. The input-specific vocabularies are then used at supervised and reinforcement learning steps, and also at test time. In our experiments on six machine translation and two image captioning datasets, our method achieves faster reinforcement learning (~2.7x faster) with less GPU memory (~2.3x less) than the full-vocabulary counterpart. We also show that our method more effectively receives rewards with fewer iterations of supervised pre-training.

2017

pdf bib
Neural Machine Translation with Source-Side Latent Graph Parsing
Kazuma Hashimoto | Yoshimasa Tsuruoka
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

This paper presents a novel neural machine translation model which jointly learns translation and source-side latent graph representations of sentences. Unlike existing pipelined approaches using syntactic parsers, our end-to-end model learns a latent graph parser as part of the encoder of an attention-based neural machine translation model, and thus the parser is optimized according to the translation objective. In experiments, we first show that our model compares favorably with state-of-the-art sequential and pipelined syntax-based NMT models. We also show that the performance of our model can be further improved by pre-training it with a small amount of treebank annotations. Our final ensemble model significantly outperforms the previous best models on the standard English-to-Japanese translation dataset.

pdf bib
A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks
Kazuma Hashimoto | Caiming Xiong | Yoshimasa Tsuruoka | Richard Socher
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Transfer and multi-task learning have traditionally focused on either a single source-target pair or very few, similar tasks. Ideally, the linguistic levels of morphology, syntax and semantics would benefit each other by being trained in a single model. We introduce a joint many-task model together with a strategy for successively growing its depth to solve increasingly complex tasks. Higher layers include shortcut connections to lower-level task predictions to reflect linguistic hierarchies. We use a simple regularization term to allow for optimizing all model weights to improve one task’s loss without exhibiting catastrophic interference of the other tasks. Our single end-to-end model obtains state-of-the-art or competitive results on five different tasks from tagging, parsing, relatedness, and entailment tasks.

2016

pdf bib
Adaptive Joint Learning of Compositional and Non-Compositional Phrase Embeddings
Kazuma Hashimoto | Yoshimasa Tsuruoka
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Tree-to-Sequence Attentional Neural Machine Translation
Akiko Eriguchi | Kazuma Hashimoto | Yoshimasa Tsuruoka
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Domain Adaptation for Neural Networks by Parameter Augmentation
Yusuke Watanabe | Kazuma Hashimoto | Yoshimasa Tsuruoka
Proceedings of the 1st Workshop on Representation Learning for NLP

pdf bib
Domain Adaptation and Attention-Based Unknown Word Replacement in Chinese-to-Japanese Neural Machine Translation
Kazuma Hashimoto | Akiko Eriguchi | Yoshimasa Tsuruoka
Proceedings of the 3rd Workshop on Asian Translation (WAT2016)

This paper describes our UT-KAY system that participated in the Workshop on Asian Translation 2016. Based on an Attention-based Neural Machine Translation (ANMT) model, we build our system by incorporating a domain adaptation method for multiple domains and an attention-based unknown word replacement method. In experiments, we verify that the attention-based unknown word replacement method is effective in improving translation scores in Chinese-to-Japanese machine translation. We further show results of manual analysis on the replaced unknown words.

pdf bib
Character-based Decoding in Tree-to-Sequence Attention-based Neural Machine Translation
Akiko Eriguchi | Kazuma Hashimoto | Yoshimasa Tsuruoka
Proceedings of the 3rd Workshop on Asian Translation (WAT2016)

This paper reports our systems (UT-AKY) submitted in the 3rd Workshop of Asian Translation 2016 (WAT’16) and their results in the English-to-Japanese translation task. Our model is based on the tree-to-sequence Attention-based NMT (ANMT) model proposed by Eriguchi et al. (2016). We submitted two ANMT systems: one with a word-based decoder and the other with a character-based decoder. Experimenting on the English-to-Japanese translation task, we have confirmed that the character-based decoder can cover almost the full vocabulary in the target language and generate translations much faster than the word-based model.

2015

pdf bib
Task-Oriented Learning of Word Embeddings for Semantic Relation Classification
Kazuma Hashimoto | Pontus Stenetorp | Makoto Miwa | Yoshimasa Tsuruoka
Proceedings of the Nineteenth Conference on Computational Natural Language Learning

pdf bib
Learning Embeddings for Transitive Verb Disambiguation by Implicit Tensor Factorization
Kazuma Hashimoto | Yoshimasa Tsuruoka
Proceedings of the 3rd Workshop on Continuous Vector Space Models and their Compositionality

2014

pdf bib
Jointly Learning Word Representations and Composition Functions Using Predicate-Argument Structures
Kazuma Hashimoto | Pontus Stenetorp | Makoto Miwa | Yoshimasa Tsuruoka
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

2013

pdf bib
Simple Customization of Recursive Neural Networks for Semantic Relation Classification
Kazuma Hashimoto | Makoto Miwa | Yoshimasa Tsuruoka | Takashi Chikayama
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing