2020
pdf
bib
abs
Information Extraction from Federal Open Market Committee Statements
Oana Frunza
Proceedings of the 1st Joint Workshop on Financial Narrative Processing and MultiLing Financial Summarisation
We present a novel approach to unsupervised information extraction by identifying and extracting relevant concept-value pairs from textual data. The system’s building blocks are domain agnostic, making it universally applicable. In this paper, we describe each component of the system and how it extracts relevant economic information from U.S. Federal Open Market Committee (FOMC) statements. Our methodology achieves an impressive 96% accuracy for identifying relevant information for a set of seven economic indicators: household spending, inflation, unemployment, economic activity, fixed in-vestment, federal funds rate, and labor market.
2010
pdf
bib
Extraction of Disease-Treatment Semantic Relations from Biomedical Sentences
Oana Frunza
|
Diana Inkpen
Proceedings of the 2010 Workshop on Biomedical Natural Language Processing
pdf
bib
Building Systematic Reviews Using Automatic Text Classification Techniques
Oana Frunza
|
Diana Inkpen
|
Stan Matwin
Coling 2010: Posters
2008
pdf
bib
abs
A Trainable Tokenizer, solution for multilingual texts and compound expression tokenization
Oana Frunza
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
Tokenization is one of the initial steps done for almost any text processing task. It is not particularly recognized as a challenging task for English monolingual systems but it rapidly increases in complexity for systems that apply it for different languages. This article proposes a supervised learning approach to perform the tokenization task. The method presented in this article is based on character transitions representation, a representation that allows compound expressions to be recognized as a single token. Compound tokens are identified independent of the character that creates the expression. The method automatically learns tokenization rules from a pre-tokenized corpus. The results obtained using the trainable system show that for Romanian and English a statistical significant improvement is obtained over a baseline system that tokenizes texts on every non-alphanumeric character.
pdf
bib
Textual Information for Predicting Functional Properties of the Genes
Oana Frunza
|
Diana Inkpen
Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing
2006
pdf
bib
Semi-Supervised Learning of Partial Cognates Using Bilingual Bootstrapping
Oana Frunza
|
Diana Inkpen
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics