Pursing power in Arabic on-line discussion forums
Marc Tomlinson | David Bracewell | Mary Draper | Zewar Almissour | Ying Shi | Jeremy Bensley
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
We present a novel corpus for identifying individuals within a group setting that are attempting to gain power within the group. The corpus is entirely in Arabic and is derived from the on-line WikiTalk discussion forums. Entries on the forums were annotated at multiple levels, top-level annotations identified whether an individual was pursuing power on the forum, and low level annotations identified linguistic indicators that signaled an individuals social intentions. An analysis of our annotations reflects a high-degree of overlap between current theories on power and conflict within a group and the behavior of individuals within the transcripts. The described datasource provides an appropriate means for modeling an individual's pursuit of power within an on-line discussion group and also allows for enumeration and validation of current theories on the ways in which individuals strive for power.
This paper explores how a battery of unsupervised techniques can be used in order to create large, high-quality corpora for textual inference applications, such as systems for recognizing textual entailment (TE) and textual contradiction (TC). We show that it is possible to automatically generate sets of positive and negative instances of textual entailment and contradiction from textual corpora with greater than 90% precision. We describe how we generated more than 1 million TE pairs - and a corresponding set of and 500,000 TC pairs - from the documents found in the 2 GB AQUAINT-2 newswire corpus.