AbstractWe model the production of quantified referring expressions (QREs) that identity collections of visual items. A previous approach, called Perceptual Cost Pruning, modeled human QRE production using a preference-based referring expression generation algorithm, first removing facts from the input knowledge base based on a model of perceptual cost. In this paper, we present an alternative model that incrementally constructs a symbolic knowledge base through simulating human visual attention/perception from raw images. We demonstrate that this model produces the same output as Perceptual Cost Pruning. We argue that this is a more extensible approach and a step toward developing a wider range of process-level models of human visual description.