Edinburgh’s Submissions to the 2020 Machine Translation Efficiency Task

Nikolay Bogoychev, Roman Grundkiewicz, Alham Fikri Aji, Maximiliana Behnke, Kenneth Heafield, Sidharth Kashyap, Emmanouil-Ioannis Farsarakis, Mateusz Chudyk


Abstract
We participated in all tracks of the Workshop on Neural Generation and Translation 2020 Efficiency Shared Task: single-core CPU, multi-core CPU, and GPU. At the model level, we use teacher-student training with a variety of student sizes, tie embeddings and sometimes layers, use the Simpler Simple Recurrent Unit, and introduce head pruning. On GPUs, we used 16-bit floating-point tensor cores. On CPUs, we customized 8-bit quantization and multiple processes with affinity for the multi-core setting. To reduce model size, we experimented with 4-bit log quantization but use floats at runtime. In the shared task, most of our submissions were Pareto optimal with respect the trade-off between time and quality.
Anthology ID:
2020.ngt-1.26
Volume:
Proceedings of the Fourth Workshop on Neural Generation and Translation
Month:
July
Year:
2020
Address:
Online
Venues:
ACL | NGT | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
218–224
Language:
URL:
https://www.aclweb.org/anthology/2020.ngt-1.26
DOI:
10.18653/v1/2020.ngt-1.26
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/2020.ngt-1.26.pdf
Dataset:
 2020.ngt-1.26.Dataset.txt
Video:
 http://slideslive.com/38929840