We present an evaluation framework to enable developers of information seeking, transaction based spoken dialogue systems to compare the robustness of natural language understanding (NLU) approaches across varying levels of word error rate and contrasting domains. We develop statistical and semantic parsing based approaches to dialogue act identification and concept retrieval. Voice search is used in each approach to ultimately query the database. Included in the framework is a method for developers to bootstrap a representative pseudo-corpus, which is used to estimate NLU performance in a new domain. We illustrate the relative merits of these NLU techniques by contrasting our statistical NLU approach with a semantic parsing method over two contrasting applications, our CheckItOut library system and the deployed Lets Go Public! system, across four levels of word error rate. We find that with respect to both dialogue act identification and concept retrieval, our statistical NLU approach is more likely to robustly accommodate the freer form, less constrained utterances of CheckItOut at higher word error rates than is possible with semantic parsing.