Unsupervised Event Clustering and Aggregation from Newswire and Web Articles

Swen Ribeiro, Olivier Ferret, Xavier Tannier


Abstract
In this paper, we present an unsupervised pipeline approach for clustering news articles based on identified event instances in their content. We leverage press agency newswire and monolingual word alignment techniques to build meaningful and linguistically varied clusters of articles from the web in the perspective of a broader event type detection task. We validate our approach on a manually annotated corpus of Web articles.
Anthology ID:
W17-4211
Volume:
Proceedings of the 2017 EMNLP Workshop: Natural Language Processing meets Journalism
Month:
September
Year:
2017
Address:
Copenhagen, Denmark
Venue:
WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
62–67
Language:
URL:
https://www.aclweb.org/anthology/W17-4211
DOI:
10.18653/v1/W17-4211
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/W17-4211.pdf