Nigel Collier


2020

pdf bib
Will-They-Won’t-They: A Very Large Dataset for Stance Detection on Twitter
Costanza Conforti | Jakob Berndt | Mohammad Taher Pilehvar | Chryssi Giannitsarou | Flavio Toxvaerd | Nigel Collier
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

We present a new challenging stance detection dataset, called Will-They-Won’t-They (WT--WT), which contains 51,284 tweets in English, making it by far the largest available dataset of the type. All the annotations are carried out by experts; therefore, the dataset constitutes a high-quality and reliable benchmark for future research in stance detection. Our experiments with a wide range of recent state-of-the-art stance detection systems show that the dataset poses a strong challenge to existing models in this domain.

pdf bib
STANDER: An Expert-Annotated Dataset for News Stance Detection and Evidence Retrieval
Costanza Conforti | Jakob Berndt | Mohammad Taher Pilehvar | Chryssi Giannitsarou | Flavio Toxvaerd | Nigel Collier
Findings of the Association for Computational Linguistics: EMNLP 2020

We present a new challenging news dataset that targets both stance detection (SD) and fine-grained evidence retrieval (ER). With its 3,291 expert-annotated articles, the dataset constitutes a high-quality benchmark for future research in SD and multi-task learning. We provide a detailed description of the corpus collection methodology and carry out an extensive analysis on the sources of disagreement between annotators, observing a correlation between their disagreement and the diffusion of uncertainty around a target in the real world. Our experiments show that the dataset poses a strong challenge to recent state-of-the-art models. Notably, our dataset aligns with an existing Twitter SD dataset: their union thus addresses a key shortcoming of previous works, by providing the first dedicated resource to study multi-genre SD as well as the interplay of signals from social media and news sources in rumour verification.

pdf bib
COMETA: A Corpus for Medical Entity Linking in the Social Media
Marco Basaldella | Fangyu Liu | Ehsan Shareghi | Nigel Collier
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Whilst there has been growing progress in Entity Linking (EL) for general language, existing datasets fail to address the complex nature of health terminology in layman’s language. Meanwhile, there is a growing need for applications that can understand the public’s voice in the health domain. To address this we introduce a new corpus called COMETA, consisting of 20k English biomedical entity mentions from Reddit expert-annotated with links to SNOMED CT, a widely-used medical knowledge graph. Our corpus satisfies a combination of desirable properties, from scale and coverage to diversity and quality, that to the best of our knowledge has not been met by any of the existing resources in the field. Through benchmark experiments on 20 EL baselines from string- to neural-based models we shed light on the ability of these systems to perform complex inference on entities and concepts under 2 challenging evaluation scenarios. Our experimental results on COMETA illustrate that no golden bullet exists and even the best mainstream techniques still have a significant performance gap to fill, while the best solution relies on combining different views of data.

2019

pdf bib
On the Importance of the Kullback-Leibler Divergence Term in Variational Autoencoders for Text Generation
Victor Prokhorov | Ehsan Shareghi | Yingzhen Li | Mohammad Taher Pilehvar | Nigel Collier
Proceedings of the 3rd Workshop on Neural Generation and Translation

Variational Autoencoders (VAEs) are known to suffer from learning uninformative latent representation of the input due to issues such as approximated posterior collapse, or entanglement of the latent space. We impose an explicit constraint on the Kullback-Leibler (KL) divergence term inside the VAE objective function. While the explicit constraint naturally avoids posterior collapse, we use it to further understand the significance of the KL term in controlling the information transmitted through the VAE channel. Within this framework, we explore different properties of the estimated posterior distribution, and highlight the trade-off between the amount of information encoded in a latent code during training, and the generative capacity of the model.

pdf bib
BioReddit: Word Embeddings for User-Generated Biomedical NLP
Marco Basaldella | Nigel Collier
Proceedings of the Tenth International Workshop on Health Text Mining and Information Analysis (LOUHI 2019)

Word embeddings, in their different shapes and iterations, have changed the natural language processing research landscape in the last years. The biomedical text processing field is no stranger to this revolution; however, scholars in the field largely trained their embeddings on scientific documents only, even when working on user-generated data. In this paper we show how training embeddings from a corpus collected from user-generated text from medical forums heavily influences the performance on downstream tasks, outperforming embeddings trained both on general purpose data or on scientific papers when applied on user-generated content.

pdf bib
Generating Knowledge Graph Paths from Textual Definitions using Sequence-to-Sequence Models
Victor Prokhorov | Mohammad Taher Pilehvar | Nigel Collier
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

We present a novel method for mapping unrestricted text to knowledge graph entities by framing the task as a sequence-to-sequence problem. Specifically, given the encoded state of an input text, our decoder directly predicts paths in the knowledge graph, starting from the root and ending at the the target node following hypernym-hyponym relationships. In this way, and in contrast to other text-to-entity mapping systems, our model outputs hierarchically structured predictions that are fully interpretable in the context of the underlying ontology, in an end-to-end manner. We present a proof-of-concept experiment with encouraging results, comparable to those of state-of-the-art systems.

pdf bib
A Richer-but-Smarter Shortest Dependency Path with Attentive Augmentation for Relation Extraction
Duy-Cat Can | Hoang-Quynh Le | Quang-Thuy Ha | Nigel Collier
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

To extract the relationship between two entities in a sentence, two common approaches are (1) using their shortest dependency path (SDP) and (2) using an attention model to capture a context-based representation of the sentence. Each approach suffers from its own disadvantage of either missing or redundant information. In this work, we propose a novel model that combines the advantages of these two approaches. This is based on the basic information in the SDP enhanced with information selected by several attention mechanisms with kernel filters, namely RbSP (Richer-but-Smarter SDP). To exploit the representation behind the RbSP structure effectively, we develop a combined deep neural model with a LSTM network on word sequences and a CNN on RbSP. Experimental results on the SemEval-2010 dataset demonstrate improved performance over competitive baselines. The data and source code are available at https://github.com/catcd/RbSP.

2018

pdf bib
Card-660: Cambridge Rare Word Dataset - a Reliable Benchmark for Infrequent Word Representation Models
Mohammad Taher Pilehvar | Dimitri Kartsaklis | Victor Prokhorov | Nigel Collier
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Rare word representation has recently enjoyed a surge of interest, owing to the crucial role that effective handling of infrequent words can play in accurate semantic understanding. However, there is a paucity of reliable benchmarks for evaluation and comparison of these techniques. We show in this paper that the only existing benchmark (the Stanford Rare Word dataset) suffers from low-confidence annotations and limited vocabulary; hence, it does not constitute a solid comparison framework. In order to fill this evaluation gap, we propose Cambridge Rare word Dataset (Card-660), an expert-annotated word similarity dataset which provides a highly reliable, yet challenging, benchmark for rare word representation techniques. Through a set of experiments we show that even the best mainstream word embeddings, with millions of words in their vocabularies, are unable to achieve performances higher than 0.43 (Pearson correlation) on the dataset, compared to a human-level upperbound of 0.90. We release the dataset and the annotation materials at https://pilehvar.github.io/card-660/.

pdf bib
Mapping Text to Knowledge Graph Entities using Multi-Sense LSTMs
Dimitri Kartsaklis | Mohammad Taher Pilehvar | Nigel Collier
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

This paper addresses the problem of mapping natural language text to knowledge base entities. The mapping process is approached as a composition of a phrase or a sentence into a point in a multi-dimensional entity space obtained from a knowledge graph. The compositional model is an LSTM equipped with a dynamic disambiguation mechanism on the input word embeddings (a Multi-Sense LSTM), addressing polysemy issues. Further, the knowledge base space is prepared by collecting random walks from a graph enhanced with textual features, which act as a set of semantic bridges between text and knowledge base entities. The ideas of this work are demonstrated on large-scale text-to-entity mapping and entity classification tasks, with state of the art results.

pdf bib
Large-scale Exploration of Neural Relation Classification Architectures
Hoang-Quynh Le | Duy-Cat Can | Sinh T. Vu | Thanh Hai Dang | Mohammad Taher Pilehvar | Nigel Collier
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Experimental performance on the task of relation classification has generally improved using deep neural network architectures. One major drawback of reported studies is that individual models have been evaluated on a very narrow range of datasets, raising questions about the adaptability of the architectures, while making comparisons between approaches difficult. In this work, we present a systematic large-scale analysis of neural relation classification architectures on six benchmark datasets with widely varying characteristics. We propose a novel multi-channel LSTM model combined with a CNN that takes advantage of all currently popular linguistic and architectural features. Our ‘Man for All Seasons’ approach achieves state-of-the-art performance on two datasets. More importantly, in our view, the model allowed us to obtain direct insights into the continued challenges faced by neural language models on this task.

pdf bib
Which Melbourne? Augmenting Geocoding with Maps
Milan Gritta | Mohammad Taher Pilehvar | Nigel Collier
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

The purpose of text geolocation is to associate geographic information contained in a document with a set (or sets) of coordinates, either implicitly by using linguistic features and/or explicitly by using geographic metadata combined with heuristics. We introduce a geocoder (location mention disambiguator) that achieves state-of-the-art (SOTA) results on three diverse datasets by exploiting the implicit lexical clues. Moreover, we propose a new method for systematic encoding of geographic metadata to generate two distinct views of the same text. To that end, we introduce the Map Vector (MapVec), a sparse representation obtained by plotting prior geographic probabilities, derived from population figures, on a World Map. We then integrate the implicit (language) and explicit (map) features to significantly improve a range of metrics. We also introduce an open-source dataset for geoparsing of news events covering global disease outbreaks and epidemics to help future evaluation in geoparsing.

pdf bib
Towards Automatic Fake News Detection: Cross-Level Stance Detection in News Articles
Costanza Conforti | Mohammad Taher Pilehvar | Nigel Collier
Proceedings of the First Workshop on Fact Extraction and VERification (FEVER)

In this paper, we propose to adapt the four-staged pipeline proposed by Zubiaga et al. (2018) for the Rumor Verification task to the problem of Fake News Detection. We show that the recently released FNC-1 corpus covers two of its steps, namely the Tracking and the Stance Detection task. We identify asymmetry in length in the input to be a key characteristic of the latter step, when adapted to the framework of Fake News Detection, and propose to handle it as a specific type of Cross-Level Stance Detection. Inspired by theories from the field of Journalism Studies, we implement and test two architectures to successfully model the internal structure of an article and its interactions with a claim.

2017

pdf bib
Vancouver Welcomes You! Minimalist Location Metonymy Resolution
Milan Gritta | Mohammad Taher Pilehvar | Nut Limsopatham | Nigel Collier
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Named entities are frequently used in a metonymic manner. They serve as references to related entities such as people and organisations. Accurate identification and interpretation of metonymy can be directly beneficial to various NLP applications, such as Named Entity Recognition and Geographical Parsing. Until now, metonymy resolution (MR) methods mainly relied on parsers, taggers, dictionaries, external word lists and other handcrafted lexical resources. We show how a minimalist neural approach combined with a novel predicate window method can achieve competitive results on the SemEval 2007 task on Metonymy Resolution. Additionally, we contribute with a new Wikipedia-based MR dataset called RelocaR, which is tailored towards locations as well as improving previous deficiencies in annotation guidelines.

pdf bib
Towards a Seamless Integration of Word Senses into Downstream NLP Applications
Mohammad Taher Pilehvar | Jose Camacho-Collados | Roberto Navigli | Nigel Collier
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Lexical ambiguity can impede NLP systems from accurate understanding of semantics. Despite its potential benefits, the integration of sense-level information into NLP systems has remained understudied. By incorporating a novel disambiguation algorithm into a state-of-the-art classification model, we create a pipeline to integrate sense-level information into downstream NLP applications. We show that a simple disambiguation of the input text can lead to consistent performance improvement on multiple topic categorization and polarity detection datasets, particularly when the fine granularity of the underlying sense inventory is reduced and the document is sufficiently large. Our results also point to the need for sense representation research to focus more on in vivo evaluations which target the performance in downstream NLP applications rather than artificial benchmarks.

pdf bib
Inducing Embeddings for Rare and Unseen Words by Leveraging Lexical Resources
Mohammad Taher Pilehvar | Nigel Collier
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

We put forward an approach that exploits the knowledge encoded in lexical resources in order to induce representations for words that were not encountered frequently during training. Our approach provides an advantage over the past work in that it enables vocabulary expansion not only for morphological variations, but also for infrequent domain specific terms. We performed evaluations in different settings, showing that the technique can provide consistent improvements on multiple benchmarks across domains.

pdf bib
SemEval-2017 Task 2: Multilingual and Cross-lingual Semantic Word Similarity
Jose Camacho-Collados | Mohammad Taher Pilehvar | Nigel Collier | Roberto Navigli
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

This paper introduces a new task on Multilingual and Cross-lingual SemanticThis paper introduces a new task on Multilingual and Cross-lingual Semantic Word Similarity which measures the semantic similarity of word pairs within and across five languages: English, Farsi, German, Italian and Spanish. High quality datasets were manually curated for the five languages with high inter-annotator agreements (consistently in the 0.9 ballpark). These were used for semi-automatic construction of ten cross-lingual datasets. 17 teams participated in the task, submitting 24 systems in subtask 1 and 14 systems in subtask 2. Results show that systems that combine statistical knowledge from text corpora, in the form of word embeddings, and external knowledge from lexical resources are best performers in both subtasks. More information can be found on the task website: http://alt.qcri.org/semeval2017/task2/

2016

pdf bib
De-Conflated Semantic Representations
Mohammad Taher Pilehvar | Nigel Collier
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
Normalising Medical Concepts in Social Media Texts by Learning Semantic Representation
Nut Limsopatham | Nigel Collier
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Improved Semantic Representation for Domain-Specific Entities
Mohammad Taher Pilehvar | Nigel Collier
Proceedings of the 15th Workshop on Biomedical Natural Language Processing

pdf bib
Modelling the Combination of Generic and Target Domain Embeddings in a Convolutional Neural Network for Sentence Classification
Nut Limsopatham | Nigel Collier
Proceedings of the 15th Workshop on Biomedical Natural Language Processing

pdf bib
Bidirectional LSTM for Named Entity Recognition in Twitter Messages
Nut Limsopatham | Nigel Collier
Proceedings of the 2nd Workshop on Noisy User-generated Text (WNUT)

In this paper, we present our approach for named entity recognition in Twitter messages that we used in our participation in the Named Entity Recognition in Twitter shared task at the COLING 2016 Workshop on Noisy User-generated text (WNUT). The main challenge that we aim to tackle in our participation is the short, noisy and colloquial nature of tweets, which makes named entity recognition in Twitter message a challenging task. In particular, we investigate an approach for dealing with this problem by enabling bidirectional long short-term memory (LSTM) to automatically learn orthographic features without requiring feature engineering. In comparison with other systems participating in the shared task, our system achieved the most effective performance on both the ‘segmentation and categorisation’ and the ‘segmentation only’ sub-tasks.

pdf bib
Learning Orthographic Features in Bi-directional LSTM for Biomedical Named Entity Recognition
Nut Limsopatham | Nigel Collier
Proceedings of the Fifth Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM2016)

End-to-end neural network models for named entity recognition (NER) have shown to achieve effective performances on general domain datasets (e.g. newswire), without requiring additional hand-crafted features. However, in biomedical domain, recent studies have shown that hand-engineered features (e.g. orthographic features) should be used to attain effective performance, due to the complexity of biomedical terminology (e.g. the use of acronyms and complex gene names). In this work, we propose a novel approach that allows a neural network model based on a long short-term memory (LSTM) to automatically learn orthographic features and incorporate them into a model for biomedical NER. Importantly, our bi-directional LSTM model learns and leverages orthographic features on an end-to-end basis. We evaluate our approach by comparing against existing neural network models for NER using three well-established biomedical datasets. Our experimental results show that the proposed approach consistently outperforms these strong baselines across all of the three datasets.

pdf bib
NLP and Online Health Reports: What do we say and what do we mean?
Nigel Collier
Proceedings of the Seventh International Workshop on Health Text Mining and Information Analysis

2015

pdf bib
Adapting Phrase-based Machine Translation to Normalise Medical Terms in Social Media Messages
Nut Limsopatham | Nigel Collier
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

2014

pdf bib
Discriminating Rhetorical Analogies in Social Media
Christoph Lofi | Christian Nieke | Nigel Collier
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
The impact of near domain transfer on biomedical named entity recognition
Nigel Collier | Mai-vu Tran | Ferdinand Paster
Proceedings of the 5th International Workshop on Health Text Mining and Information Analysis (Louhi)

2013

pdf bib
Exploring a Probabilistic Earley Parser for Event Composition in Biomedical Texts
Mai-Vu Tran | Nigel Collier | Hoang-Quynh Le | Van-Thuy Phi | Thanh-Binh Pham
Proceedings of the BioNLP Shared Task 2013 Workshop

2012

pdf bib
An Experiment in Integrating Sentiment Features for Tech Stock Prediction in Twitter
Tien Thanh Vu | Shu Chang | Quang Thuy Ha | Nigel Collier
Proceedings of the Workshop on Information Extraction and Entity Analytics on Social Media Data

pdf bib
A Hybrid Approach to Finding Phenotype Candidates in Genetic Texts
Nigel Collier | Mai-Vu Tran | Hoang-Quynh Le | Anika Oellrich | Ai Kawazoe | Martin Hall-May | Dietrich Rebholz-Schuhmann
Proceedings of COLING 2012

pdf bib
On-line Trend Analysis with Topic Models: #twitter Trends Detection Topic Model Online
Jey Han Lau | Nigel Collier | Timothy Baldwin
Proceedings of COLING 2012

2010

pdf bib
An ontology-driven system for detecting global health events
Nigel Collier | Reiko Matsuda Goodwin | John McCrae | Son Doan | Ai Kawazoe | Mike Conway | Asanee Kawtrakul | Koichi Takeuchi | Dinh Dien
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

2009

pdf bib
Using Hedges to Enhance a Disease Outbreak Report Text Mining System
Mike Conway | Son Doan | Nigel Collier
Proceedings of the BioNLP 2009 Workshop

2008

pdf bib
Global Health Monitor - A Web-based System for Detecting and Mapping Infectious Diseases
Son Doan | Quoc Hung Ngo | Ai Kawazoe | Nigel Collier
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-II

pdf bib
The Choice of Features for Classification of Verbs in Biomedical Texts
Anna Korhonen | Yuval Krymolowski | Nigel Collier
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

2007

pdf bib
The Role of Roles in Classifying Annotated Biomedical Text
Son Doan | Ai Kawazoe | Nigel Collier
Biological, translational, and clinical language processing

2006

pdf bib
Automatic Classification of Verbs in Biomedical Texts
Anna Korhonen | Yuval Krymolowski | Nigel Collier
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

2004

pdf bib
Incorporating topic information into semantic analysis models
Tony Mullen | Nigel Collier
Proceedings of the ACL Interactive Poster and Demonstration Sessions

pdf bib
Annotation of Coreference Relations Among Linguistic Expressions and Images in Biological Articles
Ai Kawazoe | Asanobu Kitamoto | Nigel Collier
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

In this paper, we propose an annotation scheme which can be used not only for annotating coreference relations between linguistic expressions, but also those among linguistic expressions and images, in scientific texts such as biomedical articles. Images in biomedical domain often contain important information for analyses and diagnoses, and we consider that linking images to textual descriptions of their semantic contents in terms of coreference relations is useful for multimodal access to the information. We present our annotation scheme and the concept of a "coreference pool," which plays a central role in the scheme. We also introduce a support tool for text annotation named Open Ontology Forge which we have already developed, and additional functions for the software to cover image annotations (ImageOF) which is now being developed.

pdf bib
An Annotation Scheme for a Rhetorical Analysis of Biology Articles
Yoko Mizuta | Nigel Collier
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf bib
Zone Identification in Biology Articles as a Basis for Information Extraction
Yoko Mizuta | Nigel Collier
Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications (NLPBA/BioNLP)

pdf bib
Introduction to the Bio-entity Recognition Task at JNLPBA
Nigel Collier | Jin-Dong Kim
Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications (NLPBA/BioNLP)

pdf bib
Sentiment Analysis using Support Vector Machines with Diverse Information Sources
Tony Mullen | Nigel Collier
Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing

2003

pdf bib
Bio-Medical Entity Extraction using Support Vector Machines
Koichi Takeuchi | Nigel Collier
Proceedings of the ACL 2003 Workshop on Natural Language Processing in Biomedicine

2002

pdf bib
Use of Support Vector Machines in Extended Named Entity Recognition
Koichi Takeuchi | Nigel Collier
COLING-02: The 6th Conference on Natural Language Learning 2002 (CoNLL-2002)

pdf bib
PIA-Core: Semantic Annotation through Example-based Learning
Nigel Collier | Koichi Takeuchi
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

pdf bib
Progress on Multi-lingual Named Entity Annotation Guidelines using RDF (S)
Nigel Collier | Koichi Takeuchi | Chikashi Nobata | Junichi Fukumoto | Norihiro Ogata
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

2000

pdf bib
Extracting the Names of Genes and Gene Products with a Hidden Markov Model
Nigel Collier | Chikashi Nobata | Jun-ichi Tsujii
COLING 2000 Volume 1: The 18th International Conference on Computational Linguistics

pdf bib
Comparison between Tagged Corpora for the Named Entity Task
Chikashi Nobata | Nigel Collier | Jun’ichi Tsujii
The Workshop on Comparing Corpora

pdf bib
Building an Annotated Corpus in the Molecular-Biology Domain
Yuka Tateisi | Tomoko Ohta | Nigel Collier | Chikashi Nobata | Jun-ichi Tsujii
Proceedings of the COLING-2000 Workshop on Semantic Annotation and Intelligent Content

1999

pdf bib
The GENIA project: corpus-based knowledge acquisition and information extraction from genome research papers
Nigel Collier | Hyun Seok Park | Norihiro Ogata | Yuka Tateishi | Chikashi Nobata | Tomoko Ohta | Tateshi Sekimizu | Hisao Imai | Katsutoshi Ibushi | Jun-ichi Tsujii
Ninth Conference of the European Chapter of the Association for Computational Linguistics

1998

pdf bib
Machine Translation vs. Dictionary Term Translation - a Comparison for English-Japanese News Article Alignment
Nigel Collier | Hideki Hirakawa | Akira Kumano
COLING 1998 Volume 1: The 17th International Conference on Computational Linguistics

pdf bib
An Experiment in Hybrid Dictionary and Statistical Sentence Alignment
Nigel Collier | Kenji Ono | Hideki Hirakawa
COLING 1998 Volume 1: The 17th International Conference on Computational Linguistics

pdf bib
Machine Translation vs. Dictionary Term Translation - a Comparison for English-Japanese News Article Alignment
Nigel Collier | Hideki Hirakawa | Akira Kumano
36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 1

pdf bib
An Experiment in Hybrid Dictionary and Statistical Sentence Alignment
Nigel Collier | Kenji Ono | Hideki Hirakawa
36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 1