Journal Article
Using online genealogical data for demographic research: an empirical examination of the FamiLinx database
Demographic Research, 51:41, 1299–1350 (2024)
Abstract
Background: Online genealogies are promising data sources for demographic research, but their limitations are understudied. This paper takes a critical approach to evaluating the potential strengths and weaknesses of using online genealogical data for population studies. We focus on the FamiLinx dataset, which contains demographic information and kinship ties across multiple countries and centuries.
Objective: We propose novel measures to assess the completeness and the quality of demographic variables in the FamiLinx data at both the individual and the familial level over the 1600–1900 period. Utilizing Sweden as a test country, we investigate how the age–sex distribution and the mortality levels of the digital population extracted from FamiLinx diverge from the registered population.
Methods: We employ descriptive statistics, negative binomial regression modeling, and standard life table techniques for our measures of completeness and quality.
Results: Missing values and accuracy in demographic information from FamiLinx are selective. When one demographic variable is available, researchers can effectively anticipate the availability of other demographic information. The completeness and quality of demographic variables within kinship networks are markedly higher for individuals with more complete and accurate demographic information. Populations from FamiLinx display lower mortality levels than the registered population and their representativeness improves towards the end of the 19th century.
Contribution: This study sheds new light on the opportunities and challenges of harnessing online genealogies for demographic research. Although this data source offers much promise, its usability in population studies is dependent on the quality and completeness of its recorded demographic information and their selectivity.