Anca Dinu


2019

pdf bib
Linguistic classification: dealing jointly with irrelevance and inconsistency
Laura Franzoi | Andrea Sgarro | Anca Dinu | Liviu P. Dinu
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

In this paper, we present new methods for language classification which put to good use both syntax and fuzzy tools, and are capable of dealing with irrelevant linguistic features (i.e. features which should not contribute to the classification) and even inconsistent features (which do not make sense for specific languages). We introduce a metric distance, based on the generalized Steinhaus transform, which allows one to deal jointly with irrelevance and inconsistency. To evaluate our methods, we test them on a syntactic data set, due to the linguist G. Longobardi and his school. We obtain phylogenetic trees which sometimes outperform the ones obtained by Atkinson and Gray.

2017

pdf bib
On the stylistic evolution from communism to democracy: Solomon Marcus study case
Anca Dinu | Liviu P. Dinu | Bogdan Dumitru
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017

In this article we propose a stylistic analysis of Solomon Marcus’ non-scientific published texts, gathered in six volumes, aiming to uncover some of his quantitative and qualitative fingerprints. Moreover, we compare and cluster two distinct periods of time in his writing style: 22 years of communist regime (1967-1989) and 27 years of democracy (1990-2016). The distributional analysis of Marcus’ text reveals that the passing from the communist regime period to democracy is sharply marked by two complementary changes in Marcus’ writing: in the pre-democracy period, the communist norms of writing style demanded on the one hand long phrases, long words and clichés, and on the other hand, a short list of preferred “official” topics; in democracy tendency was towards shorten phrases and words while approaching a broader area of topics.

bib
Proceedings of the First Workshop on Language technology for Digital Humanities in Central and (South-)Eastern Europe
Anca Dinu | Petya Osenova | Cristina Vertan
Proceedings of the First Workshop on Language technology for Digital Humanities in Central and (South-)Eastern Europe

pdf bib
On the annotation of vague expressions: a case study on Romanian historical texts
Anca Dinu | Walther von Hahn | Cristina Vertan
Proceedings of the First Workshop on Language technology for Digital Humanities in Central and (South-)Eastern Europe

Current approaches in Digital .Humanities tend to ignore a central as-pect of any hermeneutic introspection: the intrinsic vagueness of analyzed texts. Especially when dealing with his-torical documents neglecting vague-ness has important implications on the interpretation of the results. In this pa-per we present current limitation of an-notation approaches and describe a current methodology for annotating vagueness for historical Romanian texts.

2015

pdf bib
Cross-lingual Synonymy Overlap
Anca Dinu | Liviu P. Dinu | Ana Sabina Uban
Proceedings of the International Conference Recent Advances in Natural Language Processing

2014

pdf bib
Predicting Romanian Stress Assignment
Alina Maria Ciobanu | Anca Dinu | Liviu Dinu
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, volume 2: Short Papers

pdf bib
Aggregation methods for efficient collocation detection
Anca Dinu | Liviu Dinu | Ionut Sorodoc
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

In this article we propose a rank aggregation method for the task of collocations detection. It consists of applying some well-known methods (e.g. Dice method, chi-square test, z-test and likelihood ratio) and then aggregating the resulting collocations rankings by rank distance and Borda score. These two aggregation methods are especially well suited for the task, since the results of each individual method naturally forms a ranking of collocations. Combination methods are known to usually improve the results, and indeed, the proposed aggregation method performs better then each individual method taken in isolation.

2013

pdf bib
Temporal classification for historical Romanian texts
Alina Maria Ciobanu | Anca Dinu | Liviu Dinu | Vlad Niculae | Octavia-Maria Şulea
Proceedings of the 7th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities

pdf bib
Alternative measures of word relatedness in distributional semantics
Anca Dinu | Alina Ciobanu
Proceedings of the Joint Symposium on Semantic Processing. Textual Inference and Structures in Corpora

pdf bib
Temporal Text Classification for Romanian Novels set in the Past
Alina Maria Ciobanu | Liviu P. Dinu | Octavia-Maria Şulea | Anca Dinu | Vlad Niculae
Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013

2011

pdf bib
A Mechanism to Restrict the Scope of Clause-Bounded Quantifiers in ‘Continuation’ Semantics
Anca Dinu
Proceedings of the International Conference Recent Advances in Natural Language Processing 2011

2010

pdf bib
Building a Generative Lexicon for Romanian
Anca Dinu
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

We present in this paper an on-going research: the construction and annotation of a Romanian Generative Lexicon (RoGL). Our system follows the specifications of CLIPS project for Italian language. It contains a corpus, a type ontology, a graphical interface and a database from which we generate data in XML format.

2009

pdf bib
On the behavior of Romanian syllables related to minimum effort laws
Anca Dinu | Liviu P. Dinu
Proceedings of the Workshop Multilingual resources, technologies and evaluation for central and Eastern European languages

2008

pdf bib
On Classifying Coherent/Incoherent Romanian Short Texts
Anca Dinu
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

In this paper we present and discuss the results of a text coherence experiment performed on a small corpus of Romanian text from a number of alternative high school manuals. During the last 10 years, an abundance of alternative manuals for high school was produced and distributed in Romania. Due to the large amount of material and to the relative short time in which it was produced, the question of assessing the quality of this material emerged; this process relied mostly of subjective human personal opinion, given the lack of automatic tools for Romanian. Debates and claims of poor quality of the alternative manuals resulted in a number of examples of incomprehensible / incoherent paragraphs extracted from such manuals. Our goal was to create an automatic tool which may be used as an indication of poor quality of such texts. We created a small corpus of representative texts from Romanian alternative manuals. We manually classified the chosen paragraphs from such manuals into two categories: comprehensible/coherent text and incomprehensible/incoherent text. We then used different machine learning techniques to automatically classify them in a supervised manner. Our approach is rather simple, but the results are encouraging.

pdf bib
Authorship Identification of Romanian Texts with Controversial Paternity
Liviu Dinu | Marius Popescu | Anca Dinu
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

In this work we propose a new strategy for the authorship identification problem and we test it on an example from Romanian literature: did Radu Albala found the continuation of Mateiu Caragiale’s novel Sub pecetea tainei, or did he write himself the respective continuation? The proposed strategy is based on the similarity of rankings of function words; we compare the obtained results with the results obtained by a learning method (namely Support Vector Machines -SVM- with a string kernel).

2006

pdf bib
On the data base of Romanian syllables and some of its quantitative and cryptographic aspects
Liviu Dinu | Anca Dinu
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

In this paper we argue for the need to construct a data base of Romanian syllables. We explain the reasons for our choice of the DOOM corpus which we have used. We describe the way syllabification was performed and explain how we have constructed the data base. The main quantitative aspects which we have extracted from our research are presented. We also computed the entropy of the syllables and the entropy of the syllables w.r.t. the consonant-vowel structure. The results are compared with results of similar researches realized for different languages.

pdf bib
Total Rank Distance and Scaled Total Rank Distance: Two Alternative Metrics in Computational Linguistics
Anca Dinu | Liviu P. Dinu
Proceedings of the Workshop on Linguistic Distances