Mahsa Yarmohammadi


2020

pdf bib
CopyNext: Explicit Span Copying and Alignment in Sequence to Sequence Models
Abhinav Singh | Patrick Xia | Guanghui Qin | Mahsa Yarmohammadi | Benjamin Van Durme
Proceedings of the Fourth Workshop on Structured Prediction for NLP

Copy mechanisms are employed in sequence to sequence (seq2seq) models to generate reproductions of words from the input to the output. These frameworks, operating at the lexical type level, fail to provide an explicit alignment that records where each token was copied from. Further, they require contiguous token sequences from the input (spans) to be copied individually. We present a model with an explicit token-level copy operation and extend it to copying entire spans. Our model provides hard alignments between spans in the input and output, allowing for nontraditional applications of seq2seq, like information extraction. We demonstrate the approach on Nested Named Entity Recognition, achieving near state-of-the-art accuracy with an order of magnitude increase in decoding speed.

pdf bib
Collecting Verified COVID-19 Question Answer Pairs
Adam Poliak | Max Fleming | Cash Costello | Kenton W Murray | Mahsa Yarmohammadi | Shivani Pandya | Darius Irani | Milind Agarwal | Udit Sharma | Shuo Sun | Nicola Ivanov | Lingxi Shang | Kaushik Srinivasan | Seolhwa Lee | Xu Han | Smisha Agarwal | João Sedoc
Proceedings of the 1st Workshop on NLP for COVID-19 (Part 2) at EMNLP 2020

We release a dataset of over 2,100 COVID19 related Frequently asked Question-Answer pairs scraped from over 40 trusted websites. We include an additional 24, 000 questions pulled from online sources that have been aligned by experts with existing answered questions from our dataset. This paper describes our efforts in collecting the dataset and summarizes the resulting data. Our dataset is automatically updated daily and available at https://github.com/JHU-COVID-QA/ scraping-qas. So far, this data has been used to develop a chatbot providing users information about COVID-19. We encourage others to build analytics and tools upon this dataset as well.

2019

pdf bib
Robust Document Representations for Cross-Lingual Information Retrieval in Low-Resource Settings
Mahsa Yarmohammadi | Xutai Ma | Sorami Hisamoto | Muhammad Rahman | Yiming Wang | Hainan Xu | Daniel Povey | Philipp Koehn | Kevin Duh
Proceedings of Machine Translation Summit XVII Volume 1: Research Track

2014

pdf bib
Applications of Lexicographic Semirings to Problems in Speech and Language Processing
Richard Sproat | Mahsa Yarmohammadi | Izhak Shafran | Brian Roark
Computational Linguistics, Volume 40, Issue 4 - December 2014

pdf bib
Transforming trees into hedges and parsing with “hedgebank” grammars
Mahsa Yarmohammadi | Aaron Dunlop | Brian Roark
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2013

pdf bib
Incremental Segmentation and Decoding Strategies for Simultaneous Translation
Mahsa Yarmohammadi | Vivek Kumar Rangarajan Sridhar | Srinivas Bangalore | Baskaran Sankaran
Proceedings of the Sixth International Joint Conference on Natural Language Processing

2012

pdf bib
Harvesting Parallel Text in Multiple Languages with Limited Supervision
Luciano Barbosa | Vivek Kumar Rangarajan Sridhar | Mahsa Yarmohammadi | Srinivas Bangalore
Proceedings of COLING 2012