Hanna Suominen


pdf bib
To compress or not to compress? A Finite-State approach to Nen verbal morphology
Saliha Muradoglu | Nicholas Evans | Hanna Suominen
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop

This paper describes the development of a verbal morphological parser for an under-resourced Papuan language, Nen. Nen verbal morphology is particularly complex, with a transitive verb taking up to 1,740 unique features. The structural properties exhibited by Nen verbs raises interesting choices for analysis. Here we compare two possible methods of analysis: ‘Chunking’ and decomposition. ‘Chunking’ refers to the concept of collating morphological segments into one, whereas the decomposition model follows a more classical linguistic approach. Both models are built using the Finite-State Transducer toolkit foma. The resultant architecture shows differences in size and structural clarity. While the ‘Chunking’ model is under half the size of the full de-composed counterpart, the decomposition displays higher structural order. In this paper, we describe the challenges encountered when modelling a language exhibiting distributed exponence and present the first morphological analyser for Nen, with an overall accuracy of 80.3%.

pdf bib
Applications of Natural Language Processing in Bilingual Language Teaching: An Indonesian-English Case Study
Zara Maxwelll-Smith | Simón González Ochoa | Ben Foley | Hanna Suominen
Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications

Multilingual corpora are difficult to compile and a classroom setting adds pedagogy to the mix of factors which make this data so rich and problematic to classify. In this paper, we set out methodological considerations of using automated speech recognition to build a corpus of teacher speech in an Indonesian language classroom. Our preliminary results (64% word error rate) suggest these tools have the potential to speed data collection in this context. We provide practical examples of our data structure, details of our piloted computer-assisted processes, and fine-grained error analysis. Our study is informed and directed by genuine research questions and discussion in both the education and computational linguistics fields. We highlight some of the benefits and risks of using these emerging technologies to analyze the complex work of language teachers and in education more generally.


pdf bib
PostAc : A Visual Interactive Search, Exploration, and Analysis Platform for PhD Intensive Job Postings
Chenchen Xu | Inger Mewburn | Will J Grant | Hanna Suominen
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations

Over 60% of Australian PhD graduates land their first job after graduation outside academia, but this job market remains largely hidden to these job seekers. Employers’ low awareness and interest in attracting PhD graduates means that the term “PhD” is rarely used as a keyword in job advertisements; 80% of companies looking to employ similar researchers do not specifically ask for a PhD qualification. As a result, typing in “PhD” to a job search engine tends to return mostly academic jobs. We set out to make the market for advanced research skills more visible to job seekers. In this paper, we present PostAc, an online platform of authentic job postings that helps PhD graduates sharpen their career thinking. The platform is underpinned by research on the key factors that identify what an employer is looking for when they want to hire a highly skilled researcher. Its ranking model leverages the free-form text embedded in the job description to quantify the most sought-after PhD skills and educate information seekers about the Australian job-market appetite for PhD skills. The platform makes visible the geographic location, industry sector, job title, working hours, continuity, and wage of the research intensive jobs. This is the first data-driven exploration in this field. Both empirical results and online platform will be presented in this paper.


pdf bib
EPUTION at SemEval-2018 Task 2: Emoji Prediction with User Adaption
Liyuan Zhou | Qiongkai Xu | Hanna Suominen | Tom Gedeon
Proceedings of The 12th International Workshop on Semantic Evaluation

This paper describes our approach, called EPUTION, for the open trial of the SemEval- 2018 Task 2, Multilingual Emoji Prediction. The task relates to using social media — more precisely, Twitter — with its aim to predict the most likely associated emoji of a tweet. Our solution for this text classification problem explores the idea of transfer learning for adapting the classifier based on users’ tweeting history. Our experiments show that our user-adaption method improves classification results by more than 6 per cent on the macro-averaged F1. Thus, our paper provides evidence for the rationality of enriching the original corpus longitudinally with user behaviors and transferring the lessons learned from corresponding users to specific instances.

pdf bib
The Importance of Recommender and Feedback Features in a Pronunciation Learning Aid
Dzikri Fudholi | Hanna Suominen
Proceedings of the 5th Workshop on Natural Language Processing Techniques for Educational Applications

Verbal communication — and pronunciation as its part — is a core skill that can be developed through guided learning. An artificial intelligence system can take a role in these guided learning approaches as an enabler of an application for pronunciation learning with a recommender system to guide language learners through exercises and feedback system to correct their pronunciation. In this paper, we report on a user study on language learners’ perceived usefulness of the application. 16 international students who spoke non-native English and lived in Australia participated. 13 of them said they need to improve their pronunciation skills in English because of their foreign accent. The feedback system with features for pronunciation scoring, speech replay, and giving a pronunciation example was deemed essential by most of the respondents. In contrast, a clear dichotomy between the recommender system perceived as useful or useless existed; the system had features to prompt new common words or old poorly-scored words. These results can be used to target research and development from information retrieval and reinforcement learning for better and better recommendations to speech recognition and speech analytics for accent acquisition.


pdf bib
Pairwise FastText Classifier for Entity Disambiguation
Cheng Yu | Bing Chu | Rohit Ram | James Aichinger | Lizhen Qu | Hanna Suominen
Proceedings of the Australasian Language Technology Association Workshop 2016


pdf bib
Segmentation of patent claims for improving their readability
Gabriela Ferraro | Hanna Suominen | Jaume Nualart
Proceedings of the 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations (PITR)


pdf bib
Noise in Speech-to-Text Voice: Analysis of Errors and Feasibility of Phonetic Similarity for Their Correction
Hanna Suominen | Gabriela Ferraro
Proceedings of the Australasian Language Technology Association Workshop 2013 (ALTA 2013)


pdf bib
Characteristics and Analysis of Finnish and Swedish Clinical Intensive Care Nursing Narratives
Helen Allvin | Elin Carlsson | Hercules Dalianis | Riitta Danielsson-Ojala | Vidas Daudaravicius | Martin Hassel | Dimitrios Kokkinakis | Heljä Lundgren-Laine | Gunnar Nilsson | Øystein Nytrø | Sanna Salanterä | Maria Skeppstedt | Hanna Suominen | Sumithra Velupillai
Proceedings of the NAACL HLT 2010 Second Louhi Workshop on Text and Data Mining of Health Documents