Caroline Lavecchia


pdf bib
Word- and Sentence-Level Confidence Measures for Machine Translation
Sylvain Raybaud | Caroline Lavecchia | David Langlois | Kamel Smaïli
Proceedings of the 13th Annual conference of the European Association for Machine Translation


pdf bib
Phrase-Based Machine Translation based on Simulated Annealing
Caroline Lavecchia | David Langlois | Kamel Smaïli
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

In this paper, we propose a new phrase-based translation model based on inter-lingual triggers. The originality of our method is double. First we identify common source phrases. Then we use inter-lingual triggers in order to retrieve their translations. Furthermore, we consider the way of extracting phrase translations as an optimization issue. For that we use simulated annealing algorithm to find out the best phrase translations among all those determined by inter-lingual triggers. The best phrases are those which improve the translation quality in terms of Bleu score. Tests are achieved on movie subtitle corpora. They show that our phrase-based machine translation (PBMT) system outperforms a state-of-the-art PBMT system by almost 7 points.


pdf bib
Linguistic features modeling based on Partial New Cache
Kamel Smaïli | Caroline Lavecchia | Jean-Paul Haton
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

The agreement in gender and number is a critical problem in statistical language modeling. One of the main problems in the speech recognition of French language is the presence of misrecognized words due to the bad agreement (in gender and number) between words. Statistical language models do not treat this phenomenon directly. This paper focuses on how to handle the issue of agreements. We introduce an original model called Features-Cache (FC) to estimate the gender and the number of the word to predict. It is a dynamic variable-length Features-Cache for which the size is determined in accordance to syntagm delimitors. This model does not need any syntactic parsing, it is used as any other statistical language model. Several models have been carried out and the best one achieves an improvement of more than 8 points in terms of perplexity.