Modeling the bias of digital data: an approach to combining digital with official statistics to estimate and predict migration trends
Sociological Methods and Research, 1–39 (2023)
Obtaining reliable and timely estimates of migration flows is critical for advancing the migration theory and guiding policy decisions, but it remains a challenge. Digital data provide granular information on time and space, but do not draw from representative samples of the population, leading to biased estimates. We propose a method for combining digital data and official statistics by using the official statistics to model the spatial and temporal dependence structure of the biases of digital data. We use simulations to demonstrate the validity of the model, then empirically illustrate our approach by combining geo-located Twitter data with data from the American Community Survey (ACS) to estimate state-level out-migration probabilities in the United States. We show that our model, which combines unbiased and biased data, produces predictions that are more accurate than predictions based solely on unbiased data. Our approach demonstrates how digital data can be used to complement, rather than replace, official statistics.