Byoung-Tak Zhang


2020

pdf bib
Toward General Scene Graph: Integration of Visual Semantic Knowledge with Entity Synset Alignment
Woo Suk Choi | Kyoung-Woon On | Yu-Jung Heo | Byoung-Tak Zhang
Proceedings of the First Workshop on Advances in Language and Vision Research

Scene graph is a graph representation that explicitly represents high-level semantic knowledge of an image such as objects, attributes of objects and relationships between objects. Various tasks have been proposed for the scene graph, but the problem is that they have a limited vocabulary and biased information due to their own hypothesis. Therefore, results of each task are not generalizable and difficult to be applied to other down-stream tasks. In this paper, we propose Entity Synset Alignment(ESA), which is a method to create a general scene graph by aligning various semantic knowledge efficiently to solve this bias problem. The ESA uses a large-scale lexical database, WordNet and Intersection of Union (IoU) to align the object labels in multiple scene graphs/semantic knowledge. In experiment, the integrated scene graph is applied to the image-caption retrieval task as a down-stream task. We confirm that integrating multiple scene graphs helps to get better representations of images.

2019

pdf bib
Dual Attention Networks for Visual Reference Resolution in Visual Dialog
Gi-Cheon Kang | Jaeseo Lim | Byoung-Tak Zhang
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Visual dialog (VisDial) is a task which requires a dialog agent to answer a series of questions grounded in an image. Unlike in visual question answering (VQA), the series of questions should be able to capture a temporal context from a dialog history and utilizes visually-grounded information. Visual reference resolution is a problem that addresses these challenges, requiring the agent to resolve ambiguous references in a given question and to find the references in a given image. In this paper, we propose Dual Attention Networks (DAN) for visual reference resolution in VisDial. DAN consists of two kinds of attention modules, REFER and FIND. Specifically, REFER module learns latent relationships between a given question and a dialog history by employing a multi-head attention mechanism. FIND module takes image features and reference-aware representations (i.e., the output of REFER module) as input, and performs visual grounding via bottom-up attention mechanism. We qualitatively and quantitatively evaluate our model on VisDial v1.0 and v0.9 datasets, showing that DAN outperforms the previous state-of-the-art model by a significant margin.

pdf bib
CoDraw: Collaborative Drawing as a Testbed for Grounded Goal-driven Communication
Jin-Hwa Kim | Nikita Kitaev | Xinlei Chen | Marcus Rohrbach | Byoung-Tak Zhang | Yuandong Tian | Dhruv Batra | Devi Parikh
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

In this work, we propose a goal-driven collaborative task that combines language, perception, and action. Specifically, we develop a Collaborative image-Drawing game between two agents, called CoDraw. Our game is grounded in a virtual world that contains movable clip art objects. The game involves two players: a Teller and a Drawer. The Teller sees an abstract scene containing multiple clip art pieces in a semantically meaningful configuration, while the Drawer tries to reconstruct the scene on an empty canvas using available clip art pieces. The two players communicate with each other using natural language. We collect the CoDraw dataset of ~10K dialogs consisting of ~138K messages exchanged between human players. We define protocols and metrics to evaluate learned agents in this testbed, highlighting the need for a novel “crosstalk” evaluation condition which pairs agents trained independently on disjoint subsets of the training data. We present models for our task and benchmark them using both fully automated evaluation and by having them play the game live with humans.

2003

pdf bib
Text Chunking by Combining Hand-Crafted Rules and Memory-Based Learning
Seong-Bae Park | Byoung-Tak Zhang
Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics

2002

pdf bib
A Comparative Evaluation of Data-driven Models in Translation Selection of Machine Translation
Yu-Seop Kim | Jeong-Ho Chang | Byoung-Tak Zhang
COLING 2002: The 19th International Conference on Computational Linguistics

2000

pdf bib
Word Sense Disambiguation by Learning from Unlabeled Data
Seong-Bae Park | Byoung-Tak Zhang | Yung Taek Kim
Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics

pdf bib
Reducing Parsing Complexity by Intra-Sentence Segmentation based on Maximum Entropy Model
Sung Dong Kim | Byoung-Tak Zhang | Yung Taek Kim
2000 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora

1990

pdf bib
Morphological Analysis and Synthesis by Automated Discovery and Acquisition of Linguistic Rules
Byoung-Tak Zhang | Yung-Taek Kim
COLING 1990 Volume 2: Papers presented to the 13th International Conference on Computational Linguistics