Smoothing Demographic Data: Flexible Models in Population Studies
Carlo Giovanni Camarda
Start: 25 May 2020
End: 29 May 2020
Location: Online course. Link tba.
This course provides an applied introduction to modern and flexible statistical techniques for modeling demographic data. Traditional demographic methods tend to either apply a large number of parameters or impose strong parametric assumptions. In this course you will learn to master flexible models to extract the most from your data with the fewest assumptions.
Smoothing the relationship between two variables (e.g. life expectancy and GDP per capita) is the simplest example where no prior knowledge of their relationship is assumed. However, more complex examples are frequent in demography. Examples include the pattern of mortality at different ages and/or at different time points by sex and cause of death; the fertility pattern across ages, cohorts and parity; spatial patterns of demographic phenomena; and non-linear effects of age or income by specific health outcomes. Moreover, several population patterns are intrinsically continuous, it thus seems natural to model them by smooth functions which could be practically treated as continuous curves in, for instance, decomposition and rate-of-change calculations.
The course will start with an overview of generalized linear models (log-linear and logistic models). P-splines will be then presented as the most suitable and clear-cut smoothing approach for demographic data. This class of models can be easily generalized to more complex data structures (multi-dimensional and spatial data) and to achieve specific needs (forecasting and specialized smoothing).
While we will focus on the few theoretical concepts that underpin the more detailed literature, handouts for reproducing outcomes presented in class will be provided.
This will help to emphasize the use of modern software such as R for implementing the approaches presented on relevant demographic datasets. By the end of the course, smoothing won’t be seen as a mere black box, but as a modern statistical tool to explore and model population data at their best.
Each of the five course days will consist of two one-hour lectures:
- First one-hour lecture from 9:30-10:30 CET (Central European Time)
- Second one-hour lecture from 14:30-15:30 CET
Monday, May 25: Introduction to Generalized Linear Models
- Reminder on linear models
- Generalized Linear Models
- Poisson GLMs
- Including exposures
Tuesday, May 26: Discrete Smoothing
- Discrete Smoothing
- Generalized Linear Smoothing
- Optimal amount of smoothing
- Histogram smoothing
- Incorporating exposures
Wednesday, May 27: P-splines: an introduction in demography
- Non-linear relationships
- P-splines for Gaussian data
- P-splines for Poisson data
- Including exposures
Thursday, May 28: Extending P-splines
- Extrapolating with P-splines
- P-splines with more covariates
- Smoothing spatial data
- P-GAM for smoothing demographic data
Friday, May 29: More about P-splines
- Tensor P-splines
- Extrapolation in two dimensions
- Shape constraints
- Calculus with smooth data
The course is targeted at non-statisticians and it will introduce all concepts from the basics. However, elementary knowledge of demographic analysis (i.e. construction of a life-table) and statistics (i.e. regressions) is required. Familiarity with basic concepts in matrix algebra (transposing and inverting a matrix) is helpful but not essential. Participants are expected to have a working knowledge of R because handouts will require its use. Participants are expected to re-read slides and work on the handouts with R and an associated editor (e.g. RStudio) prior to each class.
Students will be evaluated on the basis of class participation.
A reading list will be provided as well as slides from the lectures and handouts for reproducing all examples.