Using online genealogical data for demographic research: an empirical examination of the FamiLinx Database
submitted: 24 January 2024 / last edited: 25 January 2024 (2024), unpublished
Online genealogies are promising data sources for demographic research, but their limitations are understudied. This paper takes a critical approach to evaluating the potential strengths and weaknesses of using online genealogical data for population studies. We propose novel measures to assess the completeness and the quality of demographic variables in the FamiLinx data at both the individual and the familial level over the 1600-1900 period. Utilizing Sweden as a test country, we investigate how the age-sex distribution and the mortality levels of the digital population extracted from FamiLinx diverge from the registered population. When one demographic variable is available, researchers can effectively anticipate the availability of other demographic information. The completeness and the quality of the demographic variables within the kinship networks are markedly higher for individuals with more complete and accurate demographic information. Lower mortality levels are observed in populations drawn from FamiLinx, which may be attributed to selectivity bias in favor of individuals experiencing more favorable demographic conditions. However, the representativeness of genealogical populations improved toward the end of the 19th century, especially when selecting individuals with more accurate birth and death dates. FamiLinx offers new opportunities for demographic research, due to its vast amount of individual information from various historical populations and their recorded kinship ties. Nonetheless, missing values and accuracy in its demographic information are selective. This selectivity needs to be addressed.
Keywords: Europe, Sweden, USA, genealogy, historical demography, kinship, quality of data