Online Seminar Talk
A Critical Perspective on Measurement in Digital Trace Data and Machine Learning, and Implications for Demography
Momin M. Malik
Laboratory of Digital and Computational Demography, April 26, 2022
Momin M. Malik (Mayo Clinic's Center for Digital Health) talked about measurement in digital trace data and machine learning, and implications for demography.
In the rush to make use of digital trace data, and leverage powerful new high-dimensional and nonparametric methods emerging from machine learning, some classic lessons from survey design, social science, and statistical theory have been pushed to the wayside or forgotten, only to return with a vengeance. While new forms of data do not necessarily suffer from the exact same problems as traditional forms, practically all traditional issues have some under-appreciated modern analogue with equally large implications for validity. In this talk, I merge my prior work looking at ways to interrogate social media-based demographic measurement with my current work showing the relevance to machine learning of fundamental issues of measurement and of modeling like Campbell's/Goodhart's Law, measurement error, latent constructs, sampling bias, confounders, and dependencies between observations. Some of these issues can be alleviated through long-standing frameworks; others need to be addressed through careful and clever study design; and some problems are due only to how results are interpreted and communicated, such that they can be mitigated simply by emphasizing the correlational, post-hoc, and contingent nature of machine learning "predictions". Together, these results provide methodological and conceptual clarifications that are necessary to get meaningful and reliable results from new forms of data and modeling.
Momin is Senior Data Science Analyst for AI Ethics at Mayo Clinic's Center for Digital Health, an instructor at the School of Social Policy & Practice at the University of Pennsylvania, and a fellow at the Institute in Critical Quantitative, Computational, & Mixed Methodologies. He holds a Bachelor's in History of Science from Harvard University, a Master's from the Oxford Internet Institute, and a Master's in Machine Learning and a PhD in Societal Computing from the School of Computer Science, Carnegie Mellon University. His work brings statistics and machine learning together with critical perspectives from social science to consider when, how, and why data and modeling succeed in their aims—and when, how, and why they can fail, within areas like policy, public health, medicine, law, education, government, journalism, social science, tech industry, civil society, and more.