Didzis Gosko

Also published as: Didzis Goško


2018

pdf bib
The SUMMA Platform: A Scalable Infrastructure for Multi-lingual Multi-media Monitoring
Ulrich Germann | Renārs Liepins | Guntis Barzdins | Didzis Gosko | Sebastião Miranda | David Nogueira
Proceedings of ACL 2018, System Demonstrations

The open-source SUMMA Platform is a highly scalable distributed architecture for monitoring a large number of media broadcasts in parallel, with a lag behind actual broadcast time of at most a few minutes. The Platform offers a fully automated media ingestion pipeline capable of recording live broadcasts, detection and transcription of spoken content, translation of all text (original or transcribed) into English, recognition and linking of Named Entities, topic detection, clustering and cross-lingual multi-document summarization of related media items, and last but not least, extraction and storage of factual claims in these news items. Browser-based graphical user interfaces provide humans with aggregated information as well as structured access to individual news items stored in the Platform’s database. This paper describes the intended use cases and provides an overview over the system’s implementation.

pdf bib
Integrating Multiple NLP Technologies into an Open-source Platform for Multilingual Media Monitoring
Ulrich Germann | Renārs Liepins | Didzis Gosko | Guntis Barzdins
Proceedings of Workshop for NLP Open Source Software (NLP-OSS)

The open-source SUMMA Platform is a highly scalable distributed architecture for monitoring a large number of media broadcasts in parallel, with a lag behind actual broadcast time of at most a few minutes. It assembles numerous state-of-the-art NLP technologies into a fully automated media ingestion pipeline that can record live broadcasts, detect and transcribe spoken content, translate from several languages (original text or transcribed speech) into English, recognize Named Entities, detect topics, cluster and summarize documents across language barriers, and extract and store factual claims in these news items. This paper describes the intended use cases and discusses the system design decisions that allowed us to integrate state-of-the-art NLP modules into an effective workflow with comparatively little effort.

2017

pdf bib
RIGOTRIO at SemEval-2017 Task 9: Combining Machine Learning and Grammar Engineering for AMR Parsing and Generation
Normunds Gruzitis | Didzis Gosko | Guntis Barzdins
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

By addressing both text-to-AMR parsing and AMR-to-text generation, SemEval-2017 Task 9 established AMR as a powerful semantic interlingua. We strengthen the interlingual aspect of AMR by applying the multilingual Grammatical Framework (GF) for AMR-to-text generation. Our current rule-based GF approach completely covered only 12.3% of the test AMRs, therefore we combined it with state-of-the-art JAMR Generator to see if the combination increases or decreases the overall performance. The combined system achieved the automatic BLEU score of 18.82 and the human Trueskill score of 107.2, to be compared to the plain JAMR Generator results. As for AMR parsing, we added NER extensions to our SemEval-2016 general-domain AMR parser to handle the biomedical genre, rich in organic compound names, achieving Smatch F1=54.0%.

2016

pdf bib
Character-Level Neural Translation for Multilingual Media Monitoring in the SUMMA Project
Guntis Barzdins | Steve Renals | Didzis Gosko
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

The paper steps outside the comfort-zone of the traditional NLP tasks like automatic speech recognition (ASR) and machine translation (MT) to addresses two novel problems arising in the automated multilingual news monitoring: segmentation of the TV and radio program ASR transcripts into individual stories, and clustering of the individual stories coming from various sources and languages into storylines. Storyline clustering of stories covering the same events is an essential task for inquisitorial media monitoring. We address these two problems jointly by engaging the low-dimensional semantic representation capabilities of the sequence to sequence neural translation models. To enable joint multi-task learning for multilingual neural translation of morphologically rich languages we replace the attention mechanism with the sliding-window mechanism and operate the sequence to sequence neural translation model on the character-level rather than on the word-level. The story segmentation and storyline clustering problem is tackled by examining the low-dimensional vectors produced as a side-product of the neural translation process. The results of this paper describe a novel approach to the automatic story segmentation and storyline clustering problem.

pdf bib
RIGA at SemEval-2016 Task 8: Impact of Smatch Extensions and Character-Level Neural Translation on AMR Parsing Accuracy
Guntis Barzdins | Didzis Gosko
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

2015

pdf bib
Riga: from FrameNet to Semantic Frames with C6.0 Rules
Guntis Barzdins | Peteris Paikens | Didzis Gosko
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

2014

pdf bib
Using C5.0 and Exhaustive Search for Boosting Frame-Semantic Parsing Accuracy
Guntis Barzdins | Didzis Gosko | Laura Rituma | Peteris Paikens
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Frame-semantic parsing is a kind of automatic semantic role labeling performed according to the FrameNet paradigm. The paper reports a novel approach for boosting frame-semantic parsing accuracy through the use of the C5.0 decision tree classifier, a commercial version of the popular C4.5 decision tree classifier, and manual rule enhancement. Additionally, the possibility to replace C5.0 by an exhaustive search based algorithm (nicknamed C6.0) is described, leading to even higher frame-semantic parsing accuracy at the expense of slightly increased training time. The described approach is particularly efficient for languages with small FrameNet annotated corpora as it is for Latvian, which is used for illustration. Frame-semantic parsing accuracy achieved for Latvian through the C6.0 algorithm is on par with the state-of-the-art English frame-semantic parsers. The paper includes also a frame-semantic parsing use-case for extracting structured information from unstructured newswire texts, sometimes referred to as bridging of the semantic gap.

pdf bib
Dependency parsing representation effects on the accuracy of semantic applications — an example of an inflective language
Lauma Pretkalniņa | Artūrs Znotiņš | Laura Rituma | Didzis Goško
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

In this paper we investigate how different dependency representations of a treebank influence the accuracy of the dependency parser trained on this treebank and the impact on several parser applications: named entity recognition, coreference resolution and limited semantic role labeling. For these experiments we use Latvian Treebank, whose native annotation format is dependency based hybrid augmented with phrase-like elements. We explore different representations of coordinations, complex predicates and punctuation mark attachment. Our experiments shows that parsers trained on the variously transformed treebanks vary significantly in their accuracy, but the best-performing parser as measured by attachment score not always leads to best accuracy for an end application.