Oana Frunza


2020

pdf bib
Information Extraction from Federal Open Market Committee Statements
Oana Frunza
Proceedings of the 1st Joint Workshop on Financial Narrative Processing and MultiLing Financial Summarisation

We present a novel approach to unsupervised information extraction by identifying and extracting relevant concept-value pairs from textual data. The system’s building blocks are domain agnostic, making it universally applicable. In this paper, we describe each component of the system and how it extracts relevant economic information from U.S. Federal Open Market Committee (FOMC) statements. Our methodology achieves an impressive 96% accuracy for identifying relevant information for a set of seven economic indicators: household spending, inflation, unemployment, economic activity, fixed in-vestment, federal funds rate, and labor market.

2010

pdf bib
Extraction of Disease-Treatment Semantic Relations from Biomedical Sentences
Oana Frunza | Diana Inkpen
Proceedings of the 2010 Workshop on Biomedical Natural Language Processing

pdf bib
Building Systematic Reviews Using Automatic Text Classification Techniques
Oana Frunza | Diana Inkpen | Stan Matwin
Coling 2010: Posters

2008

pdf bib
A Trainable Tokenizer, solution for multilingual texts and compound expression tokenization
Oana Frunza
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Tokenization is one of the initial steps done for almost any text processing task. It is not particularly recognized as a challenging task for English monolingual systems but it rapidly increases in complexity for systems that apply it for different languages. This article proposes a supervised learning approach to perform the tokenization task. The method presented in this article is based on character transitions representation, a representation that allows compound expressions to be recognized as a single token. Compound tokens are identified independent of the character that creates the expression. The method automatically learns tokenization rules from a pre-tokenized corpus. The results obtained using the trainable system show that for Romanian and English a statistical significant improvement is obtained over a baseline system that tokenizes texts on every non-alphanumeric character.

pdf bib
Textual Information for Predicting Functional Properties of the Genes
Oana Frunza | Diana Inkpen
Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing

2006

pdf bib
Semi-Supervised Learning of Partial Cognates Using Bilingual Bootstrapping
Oana Frunza | Diana Inkpen
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics