Chai Wutiwiwatchai


2010

pdf bib
Syllable-Based Thai-English Machine Transliteration
Chai Wutiwiwatchai | Ausdang Thangthai
Proceedings of the 2010 Named Entities Workshop

pdf bib
Online Temporal Language Model Adaptation for a Thai Broadcast News Transcription System
Kwanchiva Saykham | Ananlada Chotimongkol | Chai Wutiwiwatchai
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper investigates the effectiveness of online temporal language model adaptation when applied to a Thai broadcast news transcription task. Our adaptation scheme works as follow: first an initial language model is trained with broadcast news transcription available during the development period. Then the language model is adapted over time with more recent broadcast news transcription and online news articles available during deployment especially the data from the same time period as the broadcast news speech being recognized. We found that the data that are closer in time are more similar in terms of perplexity and are more suitable for language model adaptation. The LMs that are adapted over time with more recent news data are better, both in terms of perplexity and WER, than the static LM trained from only the initial set of broadcast news data. Adaptation data from broadcast news transcription improved perplexity by 38.3% and WER by 7.1% relatively. Though, online news articles achieved less improvement, it is still a useful resource as it can be obtained automatically. Better data pre-processing techniques and data selection techniques based on text similarity could be applied to the news articles to obtain further improvement from this promising result.

2008

pdf bib
Thai Broadcast News Corpus Construction and Evaluation
Markpong Jongtaveesataporn | Chai Wutiwiwatchai | Koji Iwano | Sadaoki Furui
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Large speech and text corpora are crucial to the development of a state-of-the-art speech recognition system. This paper reports on the construction and evaluation of the first Thai broadcast news speech and text corpora. Specifications and conventions used in the transcription process are described in the paper. The speech corpus contains about 17 hours of speech data while the text corpus was transcribed from around 35 hours of television broadcast news. The characteristics of the corpus were analyzed and shown in the paper. The speech corpus was split according to the evaluation focus condition used in the DARPA Hub-4 evaluation. An 18K-word Thai speech recognition system was setup to test with this speech corpus as a preliminary experiment. Acoustic model adaptations were performed to improve the system performance. The best system yielded a word error rate of about 20% for clean and planned speech, and below 30% for the overall condition.

pdf bib
Speech-to-Speech Translation Activities in Thailand
Chai Wutiwiwatchai | Thepchai Supnithi | Krit Kosawat
Proceedings of the Workshop on Technologies and Corpora for Asia-Pacific Speech Translation (TCAST)

2004

pdf bib
Hybrid Statistical and Structural Semantic Modeling for Thai Multi-Stage Spoken Language Understanding
Chai Wutiwiwatchai | Sadaoki Furui
Proceedings of the HLT-NAACL 2004 Workshop on Spoken Language Understanding for Conversational Systems and Higher Level Linguistic Information for Speech Processing

2002

pdf bib
Phonetically Distributed Continuous Speech Corpus for Thai Language
Chai Wutiwiwatchai | Patcharika Cotsomrong | Sinaporn Suebvisai | Supphanat Kanokphara
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

2000

pdf bib
Panel: The State of the Art in Thai Language Processing
Virach Sornlertlamvanich | Tanapong Potipiti | Chai Wutiwiwatchai | Pradit Mittrapiyanuruk
Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics