Extending ImageNet to Arabic using Arabic WordNet

Abdulkareem Alsudais


Abstract
ImageNet has millions of images that are labeled with English WordNet synsets. This paper investigates the extension of ImageNet to Arabic using Arabic WordNet. The objective is to discover if Arabic synsets can be found for synsets used in ImageNet. The primary finding is the identification of Arabic synsets for 1,219 of the 21,841 synsets used in ImageNet, which represents 1.1 million images. By leveraging the parent-child structure of synsets in ImageNet, this dataset is extended to 10,462 synsets (and 7.1 million images) that have an Arabic label, which is either a match or a direct hypernym, and to 17,438 synsets (and 11 million images) when a hypernym of a hypernym is included. When all hypernyms for a node are considered, an Arabic synset is found for all but four synsets. This represents the major contribution of this work: a dataset of images that have Arabic labels for 99.9% of the images in ImageNet.
Anthology ID:
2020.alvr-1.1
Volume:
Proceedings of the First Workshop on Advances in Language and Vision Research
Month:
July
Year:
2020
Address:
Online
Venues:
ACL | ALVR | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1–6
Language:
URL:
https://www.aclweb.org/anthology/2020.alvr-1.1
DOI:
10.18653/v1/2020.alvr-1.1
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/2020.alvr-1.1.pdf
Video:
 http://slideslive.com/38929757