Lin Sun


2020

pdf bib
RIVA: A Pre-trained Tweet Multimodal Model Based on Text-image Relation for Multimodal NER
Lin Sun | Jiquan Wang | Yindu Su | Fangsheng Weng | Yuxuan Sun | Zengwei Zheng | Yuanyi Chen
Proceedings of the 28th International Conference on Computational Linguistics

Multimodal named entity recognition (MNER) for tweets has received increasing attention recently. Most of the multimodal methods used attention mechanisms to capture the text-related visual information. However, unrelated or weakly related text-image pairs account for a large proportion in tweets. Visual clues unrelated to the text would incur uncertain or even negative effects for multimodal model learning. In this paper, we propose a novel pre-trained multimodal model based on Relationship Inference and Visual Attention (RIVA) for tweets. The RIVA model controls the attention-based visual clues with a gate regarding the role of image to the semantics of text. We use a teacher-student semi-supervised paradigm to leverage a large unlabeled multimodal tweet corpus with a labeled data set for text-image relation classification. In the multimodal NER task, the experimental results show the significance of text-related visual features for the visual-linguistic model and our approach achieves SOTA performance on the MNER datasets.

2019

pdf bib
TOI-CNN: a Solution of Information Extraction on Chinese Insurance Policy
Lin Sun | Kai Zhang | Fule Ji | Zhenhua Yang
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Industry Papers)

Contract analysis can significantly ease the work for humans using AI techniques. This paper shows a problem of Element Tagging on Insurance Policy (ETIP). A novel Text-Of-Interest Convolutional Neural Network (TOI-CNN) is proposed for the ETIP solution. We introduce a TOI pooling layer to replace traditional pooling layer for processing the nested phrasal or clausal elements in insurance policies. The advantage of TOI pooling layer is that the nested elements from one sentence could share computation and context in the forward and backward passes. The computation of backpropagation through TOI pooling is also demonstrated in the paper. We have collected a large Chinese insurance contract dataset and labeled the critical elements of seven categories to test the performance of the proposed method. The results show the promising performance of our method in the ETIP problem.

2017

pdf bib
Multilingual Metaphor Processing: Experiments with Semi-Supervised and Unsupervised Learning
Ekaterina Shutova | Lin Sun | Elkin Darío Gutiérrez | Patricia Lichtenstein | Srini Narayanan
Computational Linguistics, Volume 43, Issue 1 - April 2017

Highly frequent in language and communication, metaphor represents a significant challenge for Natural Language Processing (NLP) applications. Computational work on metaphor has traditionally evolved around the use of hand-coded knowledge, making the systems hard to scale. Recent years have witnessed a rise in statistical approaches to metaphor processing. However, these approaches often require extensive human annotation effort and are predominantly evaluated within a limited domain. In contrast, we experiment with weakly supervised and unsupervised techniques—with little or no annotation—to generalize higher-level mechanisms of metaphor from distributional properties of concepts. We investigate different levels and types of supervision (learning from linguistic examples vs. learning from a given set of metaphorical mappings vs. learning without annotation) in flat and hierarchical, unconstrained and constrained clustering settings. Our aim is to identify the optimal type of supervision for a learning algorithm that discovers patterns of metaphorical association from text. In order to investigate the scalability and adaptability of our models, we applied them to data in three languages from different language groups—English, Spanish, and Russian—achieving state-of-the-art results with little supervision. Finally, we demonstrate that statistical methods can facilitate and scale up cross-linguistic research on metaphor.

2014

pdf bib
CRAB 2.0: A text mining tool for supporting literature review in chemical cancer risk assessment
Yufan Guo | Diarmuid Ó Séaghdha | Ilona Silins | Lin Sun | Johan Högberg | Ulla Stenius | Anna Korhonen
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: System Demonstrations

pdf bib
Native Language Identification Using Large, Longitudinal Data
Xiao Jiang | Yufan Guo | Jeroen Geertzen | Dora Alexopoulou | Lin Sun | Anna Korhonen
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Native Language Identification (NLI) is a task aimed at determining the native language (L1) of learners of second language (L2) on the basis of their written texts. To date, research on NLI has focused on relatively small corpora. We apply NLI to the recently released EFCamDat corpus which is not only multiple times larger than previous L2 corpora but also provides longitudinal data at several proficiency levels. Our investigation using accurate machine learning with a wide range of linguistic features reveals interesting patterns in the longitudinal data which are useful for both further development of NLI and its application to research on L2 acquisition.

2013

pdf bib
Diathesis alternation approximation for verb clustering
Lin Sun | Diana McCarthy | Anna Korhonen
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Unsupervised Metaphor Identification Using Hierarchical Graph Factorization Clustering
Ekaterina Shutova | Lin Sun
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2011

pdf bib
Hierarchical Verb Clustering Using Graph Factorization
Lin Sun | Anna Korhonen
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

2010

pdf bib
Identifying the Information Structure of Scientific Abstracts: An Investigation of Three Different Schemes
Yufan Guo | Anna Korhonen | Maria Liakata | Ilona Silins | Lin Sun | Ulla Stenius
Proceedings of the 2010 Workshop on Biomedical Natural Language Processing

pdf bib
Exploring variation across biomedical subdomains
Tom Lippincott | Diarmuid Ó Séaghdha | Lin Sun | Anna Korhonen
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

pdf bib
Metaphor Identification Using Verb and Noun Clustering
Ekaterina Shutova | Lin Sun | Anna Korhonen
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

pdf bib
Investigating the cross-linguistic potential of VerbNet-style classification
Lin Sun | Thierry Poibeau | Anna Korhonen | Cédric Messiant
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

pdf bib
Modeling Semantic Relevance for Question-Answer Pairs in Web Social Communities
Baoxun Wang | Xiaolong Wang | Chengjie Sun | Bingquan Liu | Lin Sun
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

2009

pdf bib
Improving Verb Clustering with Automatically Acquired Selectional Preferences
Lin Sun | Anna Korhonen
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf bib
User-Driven Development of Text Mining Resources for Cancer Risk Assessment
Lin Sun | Anna Korhonen | Ilona Silins | Ulla Stenius
Proceedings of the BioNLP 2009 Workshop

2008

pdf bib
Automatic Classification of English Verbs Using Rich Syntactic Features
Lin Sun | Anna Korhonen | Yuval Krymolowski
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-II