Big Data | June 22, 2018

Workshop: Making Sense of Online Data for Population Research

Boy contributing to the production of big data for potential population research.

© 2Design, photocase.com

MPIDR researchers take part in organizing the workshop Making Sense of Online Data for Population Research at the International Conference on Web and Social Media (ICWSM) in Stanford/USA on June 25, 2018. It's goal is to provide a space for researchers with background in social sciences, quantitative methods and computational approaches, and from academia and industry, to come together and discuss how to appropriately interpret online data and share creative ways for grappling with issues related to bias.

MPIDR director Emilio Zagheni is leading the development of Digital and Computational Demography, a field that focuses on using our digital breadcrumbs to measure and predict demographic change as well as evaluating the implications of the digital revolution for demographic behavior. Zagheni also co-authored two papers on Digital Demography that will be presented at ICWSM:

Professional Gender Gaps Across US Cities (Read the paper)
by Karri Haranko, Emilio Zagheni, Kiran Garimella, and Ingmar Weber

Mater Certa Est, Pater Numquam: What Can Facebook Advertising Data Tell Us about Male Fertility Rates? (Read the paper)
by Francesco Rampazzo, Emilio Zagheni, Ingmar Weber, Maria Rita Testa, Francesco Billari

Main Themes of the Workshop

  1. understanding underlying populations and demographic processes,
  2. understanding data-generating behaviors, and
  3. leveraging scale to ask new questions.

Workshop Rationale: Demography, Online Data, and Bias

The global spread of social media has generated new opportunities for demographic research, as individuals leave increasing quantities of traces online that can be aggregated and mined for population research.However, whether population research is carried out on traditional data or online data, the key questions remain: What are we aggregating? And how are we aggregating it? To understand population processes requires thoughtful consideration of populations and behaviors as they are manifested in data.

We see growing opportunities for collaboration between computer scientists and social scientists specifically centered on population research methodology. Demography has been a data-driven discipline since its birth, and data collection and the development of formal, i.e. mathematical, methods have sustained most of the major advances in our understanding of population processes.

All data come with some type of bias, and an essential responsibility of scientists in any field is to understand the nature of the bias manifested in the data from which they draw inference and test hypotheses. In this respect, data generated from human interaction with digital technology are no different from data traditionally used in social science research, but they do pose new and interesting problems.


Room number LK 208 at the Li Ka Shing Conference Center at the Stanford University School of Medicine.


08:30 - 08:50 -- Introductions

08:50 - 09:40 -- Keynote 1: Joshua Blumenstock (University of California, Berkeley), Real-Time Measures of Poverty and Vulnerability

09:40 - 10:30 -- Paper Session 1: Inference from Online Data

  1. Lingzi Hong (University of Maryland) -- Accuracy and Bias in the Identification of Internal Migrants using Cell Phone Data
  2. Jenna Nobles (University of Wisconsin) -- Detecting Unreported Fertility Outcomes with Health Tracking Applications
  3. Cody Buntain (University of Maryland) -- Studying National Trends in Drug Usage with Social Media

10:30 - 10:45 -- Coffee Break

10:45 - 11:35 -- Keynote 2: Ingmar Weber (Qatar Computing Research Institute), Tapping Into Online Advertising Portals for Global Demographic Data

11:35 - 12:25 -- Paper Session 2: Addressing Bias in Online Data

  1. Nina Cesare (Institute for Health Metrics and Evaluation) -- How well can machine learning predict demographics of social media users?
  2. Kristen Harknett (University of California, San Francisco) -- What’s (Not) to Like? Facebook as a Tool for Survey Data Collection
  3. Elissa Redmiles (University of Maryland) -- Who Are You Talking to Anyway? Why sample design matters

12:25 - 12:30 -- Concluding remarks

Speakers (confirmed)

© Ingmar Weber

Ingmar Weber is a senior scientist in the Social Computing group at the Qatar Computing Research Institute (QCRI). His interdisciplinary research uses large amounts of online data from social media and other sources to study human behavior at scale.

Particular topics of interest include studying lifestyle diseases and population health, quantifying international migration using digital methods, and looking at political polarization and extremism.


© Joshua Blumenstock

Joshua Blumenstock is an Assistant Professor at the U.C. Berkeley School of Information, and the Director of the Data-Intensive Development Lab. His research lies at the intersection of machine learning and development economics, and focuses on using novel data and methods to better understand the causes and consequences of global poverty.

Dr. Blumenstock's research projects have spanned a variety of substantive topics from analyzing the response to armed conflict and natural disasters to mapping poverty using mobile phone data.



Program Committee And Panelists

Lee Fiorio (University of Washington) is a PhD student in geography whose primary research agenda is to use computational and statistical methods to bring together demographic and geographic perspectives on migration and urban development. He has a keen interest in developing methods that use large datasets from social media and other sources to achieve new understandings of population processes. Most of his research deals with measuring flows of people at multiple scales with the goal of studying systemic disadvantage.

Emilio Zagheni (Max Planck Institute for Demographic Research/University of Washington) is a demographer who uses mathematical, statistical, and computationally-intensive approaches to study the causes and consequences of population dynamics. Motivated by the ambition to improve people's lives through the scientific study of our societies, he is consolidating a portfolio that leverages interdisciplinary approaches to monitor demographic change, to explain population processes, and to predict future demographic outcomes.

More specifically, his research addresses three main inter-related topics:

  1. combining large social media data with traditional sources to track and understand migrations,
  2. evaluating the consequences of population aging on intergenerational transfers,
  3. modeling the relationships between population dynamics, the environment and infectious diseases. A common thread across his substantive interests is a consistent drive to develop methods and to analyze data in creative ways that further advance our understanding of social phenomena.

Afra Mashhadi (University of Washington) is an Affiliate Assistant Professor of Sociology at the University of Washington and was formerly a senior research scientist at Bell Laboratories, Nokia. She is interested in developing mathematical and computational models that leverage the proliferation of sensors and breakthroughs in machine learning to

  1. understand societies and social phenomena at different spatial scales,
  2. model social dynamics of human behavior. More specifically her research focus is on sensing, modeling, understanding and predicting human behavior using the digital traces that are generated daily in online and offline lives.

Bogdan State (Stanford Unversity/Facebook) is a computational social scientist and an MS candidate in Computer Science at Stanford. Bogdan has been working on the Facebook Core Data Science team for the past four years. His industry contributions have ranged from developing large-scale business intelligence systems to improving the performance of ranking models. Academically, he is interested in using Internet data to decipher the basic mechanisms of human social interaction.

Dennis Feehan (University of California, Berkeley) is a demographer and quantitative social scientist.  His research interests lie at the intersection of networks, demography, and quantitative methodology. He is currently Assistant Professor of Demography at the University of California, Berkeley. Previously, he worked as a Research Scientist at Facebook.

Previous Workshops


This workshop is organized in partnership with the IUSSP Panel on Big Data and Population Processes with support from the Max Planck Institute for Demographic Research (MPIDR).


Questions, comments or suggestions? Please contact us by email to odfpr18@googlegroups.com