A statistical approach to address the problem of heaping in self-reported income data
Journal of Applied Statistics, 43:4, 682–703 (2016)
Self-reported income information particularly suffers from an intentional coarsening of the data, which is called heaping or rounding. If it does not occur completely at random – which is usually the case – heaping and rounding have detrimental effects on the results of statistical analysis. Conventional statistical methods do not consider this kind of reporting bias, and thus might produce invalid inference. We describe a novel statistical modeling approach that allows us to deal with self-reported heaped income data in an adequate and flexible way. We suggest modeling heaping mechanisms and the true underlying model in combination. To describe the true net income distribution, we use the zero-inflated log-normal distribution. Heaping points are identified from the data by applying a heuristic procedure comparing a hypothetical income distribution and the empirical one. To determine heaping behavior, we employ two distinct models: either we assume piecewise constant heaping probabilities, or heaping probabilities are considered to increase steadily with proximity to a heaping point. We validate our approach by some examples. To illustrate the capacity of the proposed method, we conduct a case study using income data from the German National Educational Panel Study.