Khaled Shaalan


2014

pdf bib
A Survey of Arabic Named Entity Recognition and Classification
Khaled Shaalan
Computational Linguistics, Volume 40, Issue 2 - June 2014

2012

pdf bib
Arabic Word Generation and Modelling for Spell Checking
Khaled Shaalan | Mohammed Attia | Pavel Pecina | Younes Samih | Josef van Genabith
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Arabic is a language known for its rich and complex morphology. Although many research projects have focused on the problem of Arabic morphological analysis using different techniques and approaches, very few have addressed the issue of generation of fully inflected words for the purpose of text authoring. Available open-source spell checking resources for Arabic are too small and inadequate. Ayaspell, for example, the official resource used with OpenOffice applications, contains only 300,000 fully inflected words. We try to bridge this critical gap by creating an adequate, open-source and large-coverage word list for Arabic containing 9,000,000 fully inflected surface words. Furthermore, from a large list of valid forms and invalid forms we create a character-based tri-gram language model to approximate knowledge about permissible character clusters in Arabic, creating a novel method for detecting spelling errors. Testing of this language model gives a precision of 98.2% at a recall of 100%. We take our research a step further by creating a context-independent spelling correction tool using a finite-state automaton that measures the edit distance between input words and candidate corrections, the Noisy Channel Model, and knowledge-based rules. Our system performs significantly better than Hunspell in choosing the best solution, but it is still below the MS Spell Checker.

pdf bib
Automatic Extraction and Evaluation of Arabic LFG Resources
Mohammed Attia | Khaled Shaalan | Lamia Tounsi | Josef van Genabith
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This paper presents the results of an approach to automatically acquire large-scale, probabilistic Lexical-Functional Grammar (LFG) resources for Arabic from the Penn Arabic Treebank (ATB). Our starting point is the earlier, work of (Tounsi et al., 2009) on automatic LFG f(eature)-structure annotation for Arabic using the ATB. They exploit tree configuration, POS categories, functional tags, local heads and trace information to annotate nodes with LFG feature-structure equations. We utilize this annotation to automatically acquire grammatical function (dependency) based subcategorization frames and paths linking long-distance dependencies (LDDs). Many state-of-the-art treebank-based probabilistic parsing approaches are scalable and robust but often also shallow: they do not capture LDDs and represent only local information. Subcategorization frames and LDD paths can be used to recover LDDs from such parser output to capture deep linguistic information. Automatic acquisition of language resources from existing treebanks saves time and effort involved in creating such resources by hand. Moreover, data-driven automatic acquisition naturally associates probabilistic information with subcategorization frames and LDD paths. Finally, based on the statistical distribution of LDD path types, we propose empirical bounds on traditional regular expression based functional uncertainty equations used to handle LDDs in LFG.

pdf bib
Handling Unknown Words in Arabic FST Morphology
Khaled Shaalan | Mohammed Attia
Proceedings of the 10th International Workshop on Finite State Methods and Natural Language Processing

pdf bib
The Floating Arabic Dictionary: An Automatic Method for Updating a Lexical Database through the Detection and Lemmatization of Unknown Words
Mohammed Attia | Younes Samih | Khaled Shaalan | Josef van Genabith
Proceedings of COLING 2012

pdf bib
A Pipeline Arabic Named Entity Recognition using a Hybrid Approach
Mai Oudah | Khaled Shaalan
Proceedings of COLING 2012

pdf bib
Improved Spelling Error Detection and Correction for Arabic
Mohammed Attia | Pavel Pecina | Younes Samih | Khaled Shaalan | Josef van Genabith
Proceedings of COLING 2012: Posters

2011

pdf bib
Adaptive Feedback Message Generation for Second Language Learners of Arabic
Khaled Shaalan | Marwa Magdy
Proceedings of the International Conference Recent Advances in Natural Language Processing 2011

2009

pdf bib
A Hybrid Approach for Building Arabic Diacritizer
Khaled Shaalan | Hitham M. Abo Bakr | Ibrahim Ziedan
Proceedings of the EACL 2009 Workshop on Computational Approaches to Semitic Languages

2007

pdf bib
Person Name Entity Recognition for Arabic
Khaled Shaalan | Hafsa Raza
Proceedings of the 2007 Workshop on Computational Approaches to Semitic Languages: Common Issues and Resources