Text segmentation aims to uncover latent structure by dividing text from a document into coherent sections. Where previous work on text segmentation considers the tasks of document segmentation and segment labeling separately, we show that the tasks contain complementary information and are best addressed jointly. We introduce Segment Pooling LSTM (S-LSTM), which is capable of jointly segmenting a document and labeling segments. In support of joint training, we develop a method for teaching the model to recover from errors by aligning the predicted and ground truth segments. We show that S-LSTM reduces segmentation error by 30% on average, while also improving segment labeling.
Trust is implicit in many online text conversations—striking up new friendships, or asking for tech support. But trust can be betrayed through deception. We study the language and dynamics of deception in the negotiation-based game Diplomacy, where seven players compete for world domination by forging and breaking alliances with each other. Our study with players from the Diplomacy community gathers 17,289 messages annotated by the sender for their intended truthfulness and by the receiver for their perceived truthfulness. Unlike existing datasets, this captures deception in long-lasting relationships, where the interlocutors strategically combine truth with lies to advance objectives. A model that uses power dynamics and conversational contexts can predict when a lie occurs nearly as well as human players.
At about the midpoint of the IARPA MATERIAL program in October 2019, an evaluation was conducted on systems’ abilities to find Lithuanian documents based on English queries. Subsequently, both the Lithuanian test collection and results from all three teams were made available for detailed analysis. This paper capitalizes on that opportunity to begin to look at what’s working well at this stage of the program, and to identify some promising directions for future work.
We describe a modified shared-LSTM network for the Semantic Textual Similarity (STS) task at SemEval-2017. The network builds on previously explored Siamese network architectures. We treat max sentence length as an additional hyperparameter to be tuned (beyond learning rate, regularization, and dropout). Our results demonstrate that hand-tuning max sentence training length significantly improves final accuracy. After optimizing hyperparameters, we train the network on the multilingual semantic similarity task using pre-translated sentences. We achieved a correlation of 0.4792 for all the subtasks. We achieved the fourth highest team correlation for Task 4b, which was our best relative placement.