MPIDR Technical Report

compareFinRaw.r – an R program to measure the difference between datasets

Walke, R., Müller, A.

MPIDR Technical Report TR-2012-003, 7 pages (July 2012).
Rostock, Max Planck Institute for Demographic Research

"tr-2012-003-files.zip" contains the R script, a Sweave script, the related control file, and a folder holding all example output files ("Output_ready").

Keywords: data analysis, data comparability, data evaluation, data processing, software


In every data-related research it is essential to have knowledge about potential disparity of the data in use. There might be differences between modification stages of a single dataset or between distinct datasets. Either way the researcher has to be aware of these differences in order to draw proper conclusions that might be affected by different data properties. This report describes an adaptable solution to cope with that problem by using the statistical software R [R 2011]. The program compareFinRaw.r is a suitable automatic tool to measure differences of two datasets by computing distances for all relevant variable (column) pairs of the datasets on two levels. Two excerpts of the R-internal dataset Seatbelts [R 2011, Harvey1986] serve as an illustrative data example.