A demographic scaling model for estimating the total number of COVID-19 infections

originally posted on: 26 May 2020 (2020), unpublished


Understanding how widely COVID-19 has spread is critical for examining the pandemic's progression. Despite efforts to carefully monitor the pandemic, the number of confirmed cases may underestimate the total number of infections. We introduce a demographic scaling model to estimate COVID-19 infections using an broadly applicable approach that is based on minimal data requirements: COVID-19 related deaths, infection fatality rates (IFRs), and life tables. As many countries lack reliable estimates of age-specific IFRs, we scale IFRs between countries using remaining life expectancy as a marker to account for differences in age structures, health conditions, and medical services. Across 10 countries with most COVID-19 deaths as of May 13, 2020, the number of infections is estimated to be four [95% prediction interval: 2-11] times higher than the number of confirmed cases. Cross-country variation is high. The estimated number of infections is 1.4 million (six times the number of confirmed cases) for Italy; 3.1 million (2.2 times the number of confirmed cases) for the U.S.; and 1.8 times the number of confirmed cases for Germany, where testing has been comparatively extensive. Our prevalence estimates, however, are markedly lower than most others based on local seroprevalence studies. We introduce formulas for quantifying the bias that is required in our data on deaths in order to reproduce estimates published elsewhere. This bias analysis shows that either COVID-19 deaths are severely underestimated, by a factor of two or more; or alternatively, the seroprevalence based results are overestimates and not representative for the total population.

