Bofang Li


pdf bib
Subword-level Composition Functions for Learning Word Embeddings
Bofang Li | Aleksandr Drozd | Tao Liu | Xiaoyong Du
Proceedings of the Second Workshop on Subword/Character LEvel Models

Subword-level information is crucial for capturing the meaning and morphology of words, especially for out-of-vocabulary entries. We propose CNN- and RNN-based subword-level composition functions for learning word embeddings, and systematically compare them with popular word-level and subword-level models (Skip-Gram and FastText). Additionally, we propose a hybrid training scheme in which a pure subword-level model is trained jointly with a conventional word-level embedding model based on lookup-tables. This increases the fitness of all types of subword-level word embeddings; the word-level embeddings can be discarded after training, leaving only compact subword-level representation with much smaller data volume. We evaluate these embeddings on a set of intrinsic and extrinsic tasks, showing that subword-level models have advantage on tasks related to morphology and datasets with high OOV rate, and can be combined with other types of embeddings.

pdf bib
Subcharacter Information in Japanese Embeddings: When Is It Worth It?
Marzena Karpinska | Bofang Li | Anna Rogers | Aleksandr Drozd
Proceedings of the Workshop on the Relevance of Linguistic Structure in Neural Architectures for NLP

Languages with logographic writing systems present a difficulty for traditional character-level models. Leveraging the subcharacter information was recently shown to be beneficial for a number of intrinsic and extrinsic tasks in Chinese. We examine whether the same strategies could be applied for Japanese, and contribute a new analogy dataset for this language.


pdf bib
The (too Many) Problems of Analogical Reasoning with Word Vectors
Anna Rogers | Aleksandr Drozd | Bofang Li
Proceedings of the 6th Joint Conference on Lexical and Computational Semantics (*SEM 2017)

This paper explores the possibilities of analogical reasoning with vector space models. Given two pairs of words with the same relation (e.g. man:woman :: king:queen), it was proposed that the offset between one pair of the corresponding word vectors can be used to identify the unknown member of the other pair (king - man + woman = queen). We argue against such “linguistic regularities” as a model for linguistic relations in vector space models and as a benchmark, and we show that the vector offset (as well as two other, better-performing methods) suffers from dependence on vector similarity.

pdf bib
Ngram2vec: Learning Improved Word Representations from Ngram Co-occurrence Statistics
Zhe Zhao | Tao Liu | Shen Li | Bofang Li | Xiaoyong Du
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

The existing word representation methods mostly limit their information source to word co-occurrence statistics. In this paper, we introduce ngrams into four representation methods: SGNS, GloVe, PPMI matrix, and its SVD factorization. Comprehensive experiments are conducted on word analogy and similarity tasks. The results show that improved word representations are learned from ngram co-occurrence statistics. We also demonstrate that the trained ngram representations are useful in many aspects such as finding antonyms and collocations. Besides, a novel approach of building co-occurrence matrix is proposed to alleviate the hardware burdens brought by ngrams.

pdf bib
Investigating Different Syntactic Context Types and Context Representations for Learning Word Embeddings
Bofang Li | Tao Liu | Zhe Zhao | Buzhou Tang | Aleksandr Drozd | Anna Rogers | Xiaoyong Du
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

The number of word embedding models is growing every year. Most of them are based on the co-occurrence information of words and their contexts. However, it is still an open question what is the best definition of context. We provide a systematical investigation of 4 different syntactic context types and context representations for learning word embeddings. Comprehensive experiments are conducted to evaluate their effectiveness on 6 extrinsic and intrinsic tasks. We hope that this paper, along with the published code, would be helpful for choosing the best context type and representation for a given task.


pdf bib
Weighted Neural Bag-of-n-grams Model: New Baselines for Text Classification
Bofang Li | Zhe Zhao | Tao Liu | Puwei Wang | Xiaoyong Du
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

NBSVM is one of the most popular methods for text classification and has been widely used as baselines for various text representation approaches. It uses Naive Bayes (NB) feature to weight sparse bag-of-n-grams representation. N-gram captures word order in short context and NB feature assigns more weights to those important words. However, NBSVM suffers from sparsity problem and is reported to be exceeded by newly proposed distributed (dense) text representations learned by neural networks. In this paper, we transfer the n-grams and NB weighting to neural models. We train n-gram embeddings and use NB weighting to guide the neural models to focus on important words. In fact, our methods can be viewed as distributed (dense) counterparts of sparse bag-of-n-grams in NBSVM. We discover that n-grams and NB weighting are also effective in distributed representations. As a result, our models achieve new strong baselines on 9 text classification datasets, e.g. on IMDB dataset, we reach performance of 93.5% accuracy, which exceeds previous state-of-the-art results obtained by deep neural models. All source codes are publicly available at