Representativeness is crucial for inferring demographic processes from online genealogies: evidence from lifespan dynamics
Proceedings of the National Academy of Sciences of the United States of America, 119:10, e2120455119 (2022)
Crowdsourced online genealogies have an unprecedented potential to shed light on long-run population dynamics, if analyzed properly. We investigate whether the historical mortality dynamics of males in familinx, a popular genealogical dataset, are representative of the general population, or whether they are closer to those of an elite subpopulation in two territories. The first territory is the German Empire, with a low level of genealogical coverage relative to the total population size, while the second territory is The Netherlands, with a higher level of genealogical coverage relative to the population. We find that, for the period around the turn of the 20th century (for which benchmark national life tables are available), mortality is consistently lower and more homogeneous in familinx than in the general population. For that time period, the mortality levels in familinx resemble those of elites in the German Empire, while they are closer to those in national life tables in The Netherlands. For the period before the 19th century, the mortality levels in familinx mirror those of the elites in both territories. We identify the low coverage of the total population and the oversampling of elites in online genealogies as potential explanations for these findings. Emerging digital data may revolutionize our knowledge of historical demographic dynamics, but only if we understand their potential uses and limitations.
Keywords: German Empire, Germany, Netherlands, digital demography, historical demography, inequality, life expectancy