Benjamin K. Tsou

Also published as: Benjamin K. T’sou, Benjamin K.Y. Tsou, Benjamin Tsou, Benjamin K Tsou, B. K. T’sou


2020

pdf bib
Using Bilingual Patents for Translation Training
John Lee | Benjamin Tsou | Tianyuan Cai
Proceedings of the 28th International Conference on Computational Linguistics

While bilingual corpora have been instrumental for machine translation, their utility for training translators has been less explored. We investigate the use of bilingual corpora as pedagogical tools for translation in the technical domain. In a user study, novice translators revised Chinese translations of English patents through bilingual concordancing. Results show that concordancing with an in-domain bilingual corpus can yield greater improvement in translation quality of technical terms than a general-domain bilingual corpus.

2019

pdf bib
Difficulty-aware Distractor Generation for Gap-Fill Items
Chak Yan Yeung | John Lee | Benjamin Tsou
Proceedings of the The 17th Annual Workshop of the Australasian Language Technology Association

pdf bib
Towards a Proactive MWE Terminological Platform for Cross-Lingual Mediation in the Age of Big Data
Benjamin K. Tsou | Kapo Chow | Junru Nie | Yuan Yuan
Proceedings of the Human-Informed Translation and Interpreting Technology Workshop (HiT-IT 2019)

The emergence of China as a global economic power in the 21st Century has brought about surging needs for cross-lingual and cross-cultural mediation, typically performed by translators. Advances in Artificial Intelligence and Language Engineering have been bolstered by Machine learning and suitable Big Data cultivation. They have helped to meet some of the translator’s needs, though the technical specialists have not kept pace with the practical and expanding requirements in language mediation. One major technical and linguistic hurdle involves words outside the vocabulary of the translator or the lexical database he/she consults, especially Multi-Word Expressions (Compound Words) in technical subjects. A further problem is in the multiplicity of renditions of a term in the target language. This paper discusses a proactive approach following the successful extraction and application of sizable bilingual Multi-Word Expressions (Compound Words) for language mediation in technical subjects, which do not fall within the expertise of typical translators, who have inadequate appreciation of the range of new technical tools available to help him/her. Our approach draws on the personal reflections of translators and teachers of translation and is based on the prior R&D efforts relating to 300,000 comparable Chinese-English patents. The subsequent protocol we have developed aims to be proactive in meeting four identified practical challenges in technical translation (e.g. patents). It has broader economic implication in the Age of Big Data (Tsou et al, 2015) and Trade War, as the workload, if not, the challenges, increasingly cannot be met by currently available front-line translators. We shall demonstrate how new tools can be harnessed to spearhead the application of language technology not only in language mediation but also in the “teaching” and “learning” of translation. It shows how a better appreciation of their needs may enhance the contributions of the technical specialists, and thus enhance the resultant synergetic benefits.

2015

pdf bib
Augmented Comparative Corpora and Monitoring Corpus in Chinese: LIVAC and Sketch Search Engine Compared
Benjamin K. Tsou
Proceedings of the Eighth Workshop on Building and Using Comparable Corpora

2012

pdf bib
Idiomaticity and Classical Traditions in Some East Asian Languages
Benjamin K Tsou
Proceedings of the 26th Pacific Asia Conference on Language, Information, and Computation

2011

pdf bib
Joint Bilingual Sentiment Classification with Unlabeled Parallel Corpora
Bin Lu | Chenhao Tan | Claire Cardie | Benjamin K. Tsou
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

2010

pdf bib
A Note on Pseudo-comparatives like “John is rich like X!” and “Like X, John is rich!”
Benjamin Tsou
Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation

pdf bib
Mining Large-scale Parallel Corpora from Multilingual Patents: An English-Chinese example and its application to SMT
Bin Lu | Benjamin K. Tsou | Tao Jiang | Oi Yee Kwong | Jingbo Zhu
CIPS-SIGHAN Joint Conference on Chinese Language Processing

pdf bib
CityU-DAC: Disambiguating Sentiment-Ambiguous Adjectives within Context
Bin Lu | Benjamin K. Tsou
Proceedings of the 5th International Workshop on Semantic Evaluation

2009

pdf bib
Towards Bilingual Term Extraction in Comparable Patents
Bin Lu | Benjamin K. Tsou
Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation, Volume 2

2008

pdf bib
Extending a Thesaurus with Words from Pan-Chinese Sources
Oi Yee Kwong | Benjamin K. Tsou
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

pdf bib
Active Learning with Sampling by Uncertainty and Density for Word Sense Disambiguation and Text Classification
Jingbo Zhu | Huizhen Wang | Tianshun Yao | Benjamin K Tsou
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

2007

pdf bib
Extending a Thesaurus in the Pan-Chinese Context
Oi Yee Kwong | Benjamin K. Tsou
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

2006

pdf bib
Court Stenography-To-Text (“STT”) in Hong Kong: A Jurilinguistic Engineering Effort
Benjamin K. Tsou | Tom B.Y. Lai | K.K. Sin | Lawrence Y.L. Cheung
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

Implementation of legal bilingualism in Hong Kong after 1997 has necessitated the production of voluminous and extensive court proceedings and judgments in both Chinese and English. For the former, Cantonese, a dialect of Chinese, is the home language of more than 90% of the population in Hong Kong and so used in the courts. To record speech in Cantonese verbatim, a Chinese Computer-Aided Transcription system has been developed. The transcription system converts stenographic codes into Chinese text, i.e. from phonetic to orthographic representation of the language. The main challenge lies in the resolution of the sever ambiguity resulting from homocode problems in the conversion process. Cantonese Chinese is typified by problematic homonymy, which presents serious challenges. The N-gram statistical model is employed to estimate the most probable character string of the input transcription codes. Domain-specific corpora have been compiled to support the statistical computation. To improve accuracy, scalable techniques such as domain-specific transcription and special encoding are used. Put together, these techniques deliver 96% transcription accuracy.

pdf bib
Toward a Pan-Chinese Thesaurus
Benjamin K. Tsou | Oi Yee Kwong
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

In this paper, we propose a corpus-based approach to the construction of a Pan-Chinese lexical resource, starting out with the aim to enrich existing Chinese thesauri in the Pan-Chinese context. The resulting thesaurus is thus expected to contain not only the core senses and usages of Chinese lexical items but also usages specific to individual Chinese speech communities. We introduce the ideas behind the construction of the resource, outline the steps to be taken, and discuss some preliminary analyses. The work is backed up by a unique and large Chinese synchronous corpus containing textual data from various Chinese speech communities including Hong Kong, Beijing, Taipei and Singapore.

pdf bib
Regional Variation of Domain-Specific Lexical Items: Toward a Pan-Chinese Lexical Resource
Oi Yee Kwong | Benjamin K. Tsou
Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing

2005

pdf bib
Using Multiple Discriminant Analysis Approach for Linear Text Segmentation
Jingbo Zhu | Na Ye | Xinzhi Chang | Wenliang Chen | Benjamin K Tsou
Second International Joint Conference on Natural Language Processing: Full Papers

pdf bib
Semantic Role Tagging for Chinese at the Lexical Level
Oi Yee Kwong | Benjamin K. Tsou
Second International Joint Conference on Natural Language Processing: Full Papers

pdf bib
A Synchronous Corpus-Based Study on the Usage and Perception of Judgement Terms in the Pan-Chinese Context
Oi Yee Kwong | Benjamin K. Tsou
International Journal of Computational Linguistics & Chinese Language Processing, Volume 10, Number 4, December 2005: Special Issue on Selected Papers from CLSW-5

pdf bib
Data Homogeneity and Semantic Role Tagging in Chinese
Oi Yee Kwong | Benjamin K. Tsou
Proceedings of the ACL-SIGLEX Workshop on Deep Lexical Acquisition

2004

pdf bib
Morpheme-based Derivation of Bipolar Semantic Orientation of Chinese Words
Raymond W.M. Yuen | Terence Y.W. Chan | Tom B.Y. Lai | O.Y. Kwong | Benjamin K.Y. Tsou
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

2003

pdf bib
Categorial Fluidity in Chinese and its Implications for Part-of-speech Tagging
Oi Yee Kwong | Benjamin K. Tsou
10th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
A Synchronous Corpus-Based Study of Verb-Noun Fluidity in Chinese
Oi Yee Kwong | Benjamin K. Tsou
Proceedings of the 17th Pacific Asia Conference on Language, Information and Computation

2002

pdf bib
Alignment and Extraction of Bilingual Legal Terminology from Context Profiles
Oi Yee Kwong | Benjamin K. Tsou | Tom B.Y. Lai | Robert W.P. Luk | Lawrence Y.L. Cheung | Francis C.Y. Chik
COLING-02: COMPUTERM 2002: Second International Workshop on Computational Terminology

pdf bib
Some Considerations on Guidelines for Bilingual Alignment and Terminology Extraction
Lawrence Cheung | Tom Lai | Robert Luk | Oi Yee Kwong | King Kui Sin | Benjamin K. Tsou
COLING-02: The First SIGHAN Workshop on Chinese Language Processing

pdf bib
Covering Ambiguity Resolution in Chinese Word Segmentation Based on Contextual Information
Xiao Luo | Maosong Sun | Benjamin K. Tsou
COLING 2002: The 19th International Conference on Computational Linguistics

2001

pdf bib
Proceedings of the 15th Pacific Asia Conference on Language, Information and Computation
Benjamin K. T’sou | Olivia O.Y. Kwong | Tom B.Y. Lai
Proceedings of the 15th Pacific Asia Conference on Language, Information and Computation

pdf bib
Identification of Chinese Personal Names in Unrestricted Texts
Lawrence Cheung | Benjamin K. Tsou | Maosong Sun
Proceedings of the 16th Pacific Asia Conference on Language, Information and Computation

2000

pdf bib
Jurilinguistic Engineering in Cantonese Chinese: An N-gram-based Speech to Text Transcription System
B. K. T’sou | K. K. Sin | S. W. K. Chan | T. B. Y. Lai | C Lun | K. T. Ko | G. K. K. Chan | L. Y. L. Cheung
COLING 2000 Volume 2: The 18th International Conference on Computational Linguistics

pdf bib
Mining Discourse Markers for Chinese Textual Summarization
Samuel W. K. Chan | Tom B. Y. Lai | W. J. Gao | Benjamin K. T’sou
NAACL-ANLP 2000 Workshop: Automatic Summarization

pdf bib
Enhancement of a Chinese Discourse Marker Tagger with C4.5
Benjamin K. T’sou | Tom B.Y Lai | Samuel W.K. Chan | Weijun Gao | Xuegang Zhan
Second Chinese Language Processing Workshop

pdf bib
Textual Information Segmentation by Cohesive Ties
Samuel W.K. Chan | Benjamin K. T’sou | C.F. Choy
Proceedings of the 14th Pacific Asia Conference on Language, Information and Computation

pdf bib
Automatic Conversion from Phonetic to Textual Representation of Cantonese : The Case of Hong Kong Court Proceedings
Benjamin K. Tsou | K.K. Sin | Samuel W. K. Chan | Tom B. Y. Lai | Caesar Lun | K. T. Ko | Gary K. K. Chan | Lawrence Y. L. Cheung
Proceedings of the 14th Pacific Asia Conference on Language, Information and Computation

1999

pdf bib
Anaphora Resolution as Lexical Cohesion Identification
Samuel W.K. Chan | Benjamin K. T’sou
Proceedings of the 13th Pacific Asia Conference on Language, Information and Computation

1998

pdf bib
Human Judgment as a Basis for Evaluation of Discourse-Connective-Based Full-Text Abstraction in Chinese
Benjamin K. T’sou | Hing-Lung Lin | Tom B. Y. Lai | Samuel W. K. Chan
International Journal of Computational Linguistics & Chinese Language Processing, Volume 3, Number 1, February 1998: Special Issue on the 10th Research on Computational Linguistics International Conference

pdf bib
Chinese Word Segmentation without Using Lexicon and Hand-crafted Training Data
Maosong Sun | Dayang Shen | Benjamin K. Tsou
COLING 1998 Volume 2: The 17th International Conference on Computational Linguistics

pdf bib
Chinese Word Segmentation without Using Lexicon and Hand-crafted Training Data
Maosong Sun | Dayang Shen | Benjamin K. Tsou
36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 2

1997

pdf bib
Human Judgment as a Basis for Evaluation of Discourse-Connective-based Full-text Abstraction in Chinese
Benjamin K. T’sou | Hing-Lung Lin | Tom B. Y. Lai
Proceedings of the 10th Research on Computational Linguistics International Conference

pdf bib
Chinese Word Segmentation and Part-of-Speech Tagging in One Step
Tom B.Y. Lai | Maosong Sun | Benjamin K. T’sou | S. Caesar Lun
ROCLING 1997 Poster Papers

pdf bib
A Synchronous Chinese Language Corpus from Different Speech Communities: Construction and Applications
Benjamin K. T’sou | Hing-Lung Lin | Godfrey Liu | Terence Chan | Jerome Hu | Ching-hai Chew | John K.P Tse
International Journal of Computational Linguistics & Chinese Language Processing, Volume 2, Number 1, February 1997: Special Issue on Computational Resources for Research in Chinese Linguistics

1995

pdf bib
Proceedings of the 10th Pacific Asia Conference on Language, Information and Computation
Benjamin K. T’sou | Tom B. Y. Lai
Proceedings of the 10th Pacific Asia Conference on Language, Information and Computation

pdf bib
Ambiguity Resolution in Chinese Word Segmentation
Maosong Sun | Benjamin K. T’sou
Proceedings of the 10th Pacific Asia Conference on Language, Information and Computation

1992

pdf bib
A Knowledge-based Machine-aided System for Chinese Text Abstraction
Benjamin K. Tsou | Hing-cheung Ho | Tom Bong-yeung Lai | Caesar Suen Lun | Hing-lung Lin
COLING 1992 Volume 3: The 15th International Conference on Computational Linguistics

1991

pdf bib
Automatic Chinese Text Generation Based On Inference Trees
Hing-Lung Lin | Benjamin K. T’sou | Hing-Cheung Ho | Bong-Yeung Lai | Suen Caesar Lun | Chi-Yuen Choi | Chun-yu Kit
Proceedings of Rocling IV Computational Linguistics Conference IV