Hongfei Lin


2020

pdf bib
基于多粒度语义交互理解网络的幽默等级识别(A Multi-Granularity Semantic Interaction Understanding Network for Humor Level Recognition)
Jinhui Zhang (张瑾晖) | Shaowu Zhang (张绍武) | Xiaochao Fan (樊小超) | Liang Yang (杨亮) | Hongfei Lin (林鸿飞)
Proceedings of the 19th Chinese National Conference on Computational Linguistics

幽默在人们日常交流中发挥着重要作用。随着人工智能的快速发展,幽默等级识别成为自然语言处理领域的热点研究问题之一。已有的幽默等级识别研究往往将幽默文本看作一个整体,忽视了幽默文本内部的语义关系。本文将幽默等级识别视为自然语言推理任务,将幽默文本划分为“铺垫”和“笑点”两个部分,分别对其语义和语义关系进行建模,提出了一种多粒度语义交互理解网络,从单词和子句两个粒度捕获幽默文本中语义的关联和交互。本文在Reddit公开幽默数据集上进行了实验,相比之前最优结果,模型在语料上的准确率提升了1.3%。实验表明,引入幽默内部的语义关系信息可以提高模型幽默识别的性能,而本文提出的模型也可以很好地建模这种语义关系。

pdf bib
基于预训练语言模型的案件要素识别方法(A Method for Case Factor Recognition Based on Pre-trained Language Models)
Haishun Liu (刘海顺) | Lei Wang (王雷) | Yanguang Chen (陈彦光) | Shuchen Zhang (张书晨) | Yuanyuan Sun (孙媛媛) | Hongfei Lin (林鸿飞)
Proceedings of the 19th Chinese National Conference on Computational Linguistics

案件要素识别指将案件描述中重要事实描述自动抽取出来,并根据领域专家设计的要素体系进行分类,是智慧司法领域的重要研究内容。基于传统神经网络的文本编码难以提取深层次特征,基于阈值的多标签分类难以捕获标签间依赖关系,因此本文提出了基于预训练语言模型的多标签文本分类模型。该模型采用以Layer-attentive策略进行特征融合的语言模型作为编码器,使用基于LSTM的序列生成模型作为解码器。在“CAIL2019”数据集上进行实验,该方法比基于循环神经网络的算法在F1值上最高可提升7.6%,在相同超参数设置下比基础语言模型(BERT)提升约3.2%。

pdf bib
Joint Entity and Relation Extraction for Legal Documents with Legal Feature Enhancement
Yanguang Chen | Yuanyuan Sun | Zhihao Yang | Hongfei Lin
Proceedings of the 28th International Conference on Computational Linguistics

In recent years, the plentiful information contained in Chinese legal documents has attracted a great deal of attention because of the large-scale release of the judgment documents on China Judgments Online. It is in great need of enabling machines to understand the semantic information stored in the documents which are transcribed in the form of natural language. The technique of information extraction provides a way of mining the valuable information implied in the unstructured judgment documents. We propose a Legal Triplet Extraction System for drug-related criminal judgment documents. The system extracts the entities and the semantic relations jointly and benefits from the proposed legal lexicon feature and multi-task learning framework. Furthermore, we manually annotate a dataset for Named Entity Recognition and Relation Extraction in Chinese legal domain, which contributes to training supervised triplet extraction models and evaluating the model performance. Our experimental results show that the legal feature introduction and multi-task learning framework are feasible and effective for the Legal Triplet Extraction System. The F1 score of triplet extraction finally reaches 0.836 on the legal dataset.

pdf bib
ALBERT-BiLSTM for Sequential Metaphor Detection
Shuqun Li | Jingjie Zeng | Jinhui Zhang | Tao Peng | Liang Yang | Hongfei Lin
Proceedings of the Second Workshop on Figurative Language Processing

In our daily life, metaphor is a common way of expression. To understand the meaning of a metaphor, we should recognize the metaphor words which play important roles. In the metaphor detection task, we design a sequence labeling model based on ALBERT-LSTM-softmax. By applying this model, we carry out a lot of experiments and compare the experimental results with different processing methods, such as with different input sentences and tokens, or the methods with CRF and softmax. Then, some tricks are adopted to improve the experimental results. Finally, our model achieves a 0.707 F1-score for the all POS subtask and a 0.728 F1-score for the verb subtask on the TOEFL dataset.

2019

pdf bib
Telling the Whole Story: A Manually Annotated Chinese Dataset for the Analysis of Humor in Jokes
Dongyu Zhang | Heting Zhang | Xikai Liu | Hongfei Lin | Feng Xia
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Humor plays important role in human communication, which makes it important problem for natural language processing. Prior work on the analysis of humor focuses on whether text is humorous or not, or the degree of funniness, but this is insufficient to explain why it is funny. We therefore create a dataset on humor with 9,123 manually annotated jokes in Chinese. We propose a novel annotation scheme to give scenarios of how humor arises in text. Specifically, our annotations of linguistic humor not only contain the degree of funniness, like previous work, but they also contain key words that trigger humor as well as character relationship, scene, and humor categories. We report reasonable agreement between annota-tors. We also conduct an analysis and exploration of the dataset. To the best of our knowledge, we are the first to approach humor annotation for exploring the underlying mechanism of the use of humor, which may contribute to a significantly deeper analysis of humor. We also contribute with a scarce and valuable dataset, which we will release publicly.

pdf bib
Transformer-Based Capsule Network For Stock Movement Prediction
Jintao Liu | Hongfei Lin | Xikai Liu | Bo Xu | Yuqi Ren | Yufeng Diao | Liang Yang
Proceedings of the First Workshop on Financial Technology and Natural Language Processing

2018

pdf bib
WECA: A WordNet-Encoded Collocation-Attention Network for Homographic Pun Recognition
Yufeng Diao | Hongfei Lin | Di Wu | Liang Yang | Kan Xu | Zhihao Yang | Jian Wang | Shaowu Zhang | Bo Xu | Dongyu Zhang
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Homographic puns have a long history in human writing, widely used in written and spoken literature, which usually occur in a certain syntactic or stylistic structure. How to recognize homographic puns is an important research. However, homographic pun recognition does not solve very well in existing work. In this work, we first use WordNet to understand and expand word embedding for settling the polysemy of homographic puns, and then propose a WordNet-Encoded Collocation-Attention network model (WECA) which combined with the context weights for recognizing the puns. Our experiments on the SemEval2017 Task7 and Pun of the Day demonstrate that the proposed model is able to distinguish between homographic pun and non-homographic pun texts. We show the effectiveness of the model to present the capability of choosing qualitatively informative words. The results show that our model achieves the state-of-the-art performance on homographic puns recognition.

pdf bib
Construction of a Chinese Corpus for the Analysis of the Emotionality of Metaphorical Expressions
Dongyu Zhang | Hongfei Lin | Liang Yang | Shaowu Zhang | Bo Xu
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Metaphors are frequently used to convey emotions. However, there is little research on the construction of metaphor corpora annotated with emotion for the analysis of emotionality of metaphorical expressions. Furthermore, most studies focus on English, and few in other languages, particularly Sino-Tibetan languages such as Chinese, for emotion analysis from metaphorical texts, although there are likely to be many differences in emotional expressions of metaphorical usages across different languages. We therefore construct a significant new corpus on metaphor, with 5,605 manually annotated sentences in Chinese. We present an annotation scheme that contains annotations of linguistic metaphors, emotional categories (joy, anger, sadness, fear, love, disgust and surprise), and intensity. The annotation agreement analyses for multiple annotators are described. We also use the corpus to explore and analyze the emotionality of metaphors. To the best of our knowledge, this is the first relatively large metaphor corpus with an annotation of emotions in Chinese.

2016

pdf bib
DUTIR in BioNLP-ST 2016: Utilizing Convolutional Network and Distributed Representation to Extract Complicate Relations
Honglei Li | Jianhai Zhang | Jian Wang | Hongfei Lin | Zhihao Yang
Proceedings of the 4th BioNLP Shared Task Workshop

2010

pdf bib
K-means and Graph-based Approaches for Chinese Word Sense Induction Task
Lisha Wang | Yanzhao Dou | Xiaoling Sun | Hongfei Lin
CIPS-SIGHAN Joint Conference on Chinese Language Processing