The semantic similarity of words forms the basis of many natural language processing methods. These computational similarity measures are often based on a mathematical comparison of vector representations of word meanings, while human judgments of similarity differ in lacking geometrical properties, e.g., symmetric similarity and triangular similarity. In this study, we propose a novel task design to further explore human behavior by asking whether a pair of words is deemed more similar depending on an immediately preceding judgment. Results from a crowdsourcing experiment show that people consistently judge words as more similar when primed by a judgment that evokes a relevant relationship. Our analysis further shows that word2vec similarity correlated significantly better with the out-of-context judgments, thus confirming the methodological differences in human-computer judgments, and offering a new testbed for probing the differences.