John W. Ayers
The ability to track mental health conditions via social media opened the doors for large-scale, automated, mental health surveillance. However, inferring accurate population-level trends requires representative samples of the underlying population, which can be challenging given the biases inherent in social media data. While previous work has adjusted samples based on demographic estimates, the populations were selected based on specific outcomes, e.g. specific mental health conditions. We depart from these methods, by conducting analyses over demographically representative digital cohorts of social media users. To validated this approach, we constructed a cohort of US based Twitter users to measure the prevalence of depression and PTSD, and investigate how these illnesses manifest across demographic subpopulations. The analysis demonstrates that cohort-based studies can help control for sampling biases, contextualize outcomes, and provide deeper insights into the data.