Abhishek Singh


pdf bib
Clinical XLNet: Modeling Sequential Clinical Notes and Predicting Prolonged Mechanical Ventilation
Kexin Huang | Abhishek Singh | Sitong Chen | Edward Moseley | Chih-Ying Deng | Naomi George | Charolotta Lindvall
Proceedings of the 3rd Clinical Natural Language Processing Workshop

Clinical notes contain rich information, which is relatively unexploited in predictive modeling compared to structured data. In this work, we developed a new clinical text representation Clinical XLNet that leverages the temporal information of the sequence of the notes. We evaluated our models on prolonged mechanical ventilation prediction problem and our experiments demonstrated that Clinical XLNet outperforms the best baselines consistently. The models and scripts are made publicly available.

pdf bib
Voice@SRIB at SemEval-2020 Tasks 9 and 12: Stacked Ensemblingmethod for Sentiment and Offensiveness detection in Social Media
Abhishek Singh | Surya Pratap Singh Parmar
Proceedings of the Fourteenth Workshop on Semantic Evaluation

In social-media platforms such as Twitter, Facebook, and Reddit, people prefer to use code-mixed language such as Spanish-English, Hindi-English to express their opinions. In this paper, we describe different models we used, using the external dataset to train embeddings, ensembling methods for Sentimix, and OffensEval tasks. The use of pre-trained embeddings usually helps in multiple tasks such as sentence classification, and machine translation. In this experiment, we have used our trained code-mixed embeddings and twitter pre-trained embeddings to SemEval tasks. We evaluate our models on macro F1-score, precision, accuracy, and recall on the datasets. We intend to show that hyper-parameter tuning and data pre-processing steps help a lot in improving the scores. In our experiments, we are able to achieve 0.886 F1-Macro on OffenEval Greek language subtask post-evaluation, whereas the highest is 0.852 during the Evaluation Period. We stood third in Spanglish competition with our best F1-score of 0.756. Codalab username is asking28.

pdf bib
PoinT-5: Pointer Network and T-5 based Financial Narrative Summarisation
Abhishek Singh
Proceedings of the 1st Joint Workshop on Financial Narrative Processing and MultiLing Financial Summarisation

Companies provide annual reports to their shareholders at the end of the financial year that de-scribes their operations and financial conditions. The average length of these reports is 80, andit may extend up to 250 pages long. In this paper, we propose our methodology PoinT-5 (thecombination of Pointer Network and T-5 (Test-to-text transfer Transformer) algorithms) that weused in the Financial Narrative Summarisation (FNS) 2020 task. The proposed method usesPointer networks to extract important narrative sentences from the report, and then T-5 is used toparaphrase extracted sentences into a concise yet informative sentence. We evaluate our methodusing Rouge-N (1,2), L, and SU4. The proposed method achieves the highest precision scores inall the metrics and highest F1 scores in three out of four evaluation metrics that are Rouge 1, 2,and LCS and only solution to cross MUSE solution baseline in Rouge-LCS metrics.


pdf bib
Incorporating Emoji Descriptions Improves Tweet Classification
Abhishek Singh | Eduardo Blanco | Wei Jin
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Tweets are short messages that often include specialized language such as hashtags and emojis. In this paper, we present a simple strategy to process emojis: replace them with their natural language description and use pretrained word embeddings as normally done with standard words. We show that this strategy is more effective than using pretrained emoji embeddings for tweet classification. Specifically, we obtain new state-of-the-art results in irony detection and sentiment analysis despite our neural network is simpler than previous proposals.