WikiBank: Using Wikidata to Improve Multilingual Frame-Semantic Parsing
Cezar Sas | Meriem Beloucif | Anders Søgaard
Proceedings of the 12th Language Resources and Evaluation Conference

Frame-semantic annotations exist for a tiny fraction of the world’s languages, Wikidata, however, links knowledge base triples to texts in many languages, providing a common, distant supervision signal for semantic parsers. We present WikiBank, a multilingual resource of partial semantic structures that can be used to extend pre-existing resources rather than creating new man-made resources from scratch. We also integrate this form of supervision into an off-the-shelf frame-semantic parser and allow cross-lingual transfer. Using Google’s Sling architecture, we show significant improvements on the English and Spanish CoNLL 2009 datasets, whether training on the full available datasets or small subsamples thereof.


Naive Regularizers for Low-Resource Neural Machine Translation
Meriem Beloucif | Ana Valeria Gonzalez | Marcel Bollmann | Anders Søgaard
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

Neural machine translation models have little inductive bias, which can be a disadvantage in low-resource scenarios. Neural models have to be trained on large amounts of data and have been shown to perform poorly when only limited data is available. We show that using naive regularization methods, based on sentence length, punctuation and word frequencies, to penalize translations that are very different from the input sentences, consistently improves the translation quality across multiple low-resource languages. We experiment with 12 language pairs, varying the training data size between 17k to 230k sentence pairs. Our best regularizer achieves an average increase of 1.5 BLEU score and 1.0 TER score across all the language pairs. For example, we achieve a BLEU score of 26.70 on the IWSLT15 English–Vietnamese translation task simply by using relative differences in punctuation as a regularizer.


Improving word alignment for low resource languages using English monolingual SRL
Meriem Beloucif | Markus Saers | Dekai Wu
Proceedings of the Sixth Workshop on Hybrid Approaches to Translation (HyTra6)

We introduce a new statistical machine translation approach specifically geared to learning translation from low resource languages, that exploits monolingual English semantic parsing to bias inversion transduction grammar (ITG) induction. We show that in contrast to conventional statistical machine translation (SMT) training methods, which rely heavily on phrase memorization, our approach focuses on learning bilingual correlations that help translating low resource languages, by using the output language semantic structure to further narrow down ITG constraints. This approach is motivated by previous research which has shown that injecting a semantic frame based objective function while training SMT models improves the translation quality. We show that including a monolingual semantic objective function during the learning of the translation model leads towards a semantically driven alignment which is more efficient than simply tuning loglinear mixture weights against a semantic frame based evaluation metric in the final stage of statistical machine translation training. We test our approach with three different language pairs and demonstrate that our model biases the learning towards more semantically correct alignments. Both GIZA++ and ITG based techniques fail to capture meaningful bilingual constituents, which is required when trying to learn translation models for low resource languages. In contrast, our proposed model not only improve translation by injecting a monolingual objective function to learn bilingual correlations during early training of the translation model, but also helps to learn more meaningful correlations with a relatively small data set, leading to a better alignment compared to either conventional ITG or traditional GIZA++ based approaches.

Driving inversion transduction grammar induction with semantic evaluation
Meriem Beloucif | Dekai Wu
Proceedings of the Fifth Joint Conference on Lexical and Computational Semantics


XMEANT: Better semantic MT evaluation without reference translations
Chi-kiu Lo | Meriem Beloucif | Markus Saers | Dekai Wu
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Better Semantic Frame Based MT Evaluation via Inversion Transduction Grammars
Dekai Wu | Chi-kiu Lo | Meriem Beloucif | Markus Saers
Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation


Learning to Freestyle: Hip Hop Challenge-Response Induction via Transduction Rule Segmentation
Dekai Wu | Karteek Addanki | Markus Saers | Meriem Beloucif
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing