MPIDR Technical Report
compareFinRaw.r – an R program to measure the difference between datasets
MPIDR Technical Report TR-2012-003, 7 pages.
Rostock, Max Planck Institute for Demographic Research (July 2012)
"tr-2012-003-files.zip" contains the R script, a Sweave script, the related control file, and a folder holding all example output files ("Output_ready").
Abstract
In every data-related research it is essential to have knowledge about potential disparity of the data in use. There might be differences between modification stages of a single dataset or between distinct datasets. Either way the researcher has to be aware of these differences in order to draw proper conclusions that might be affected by different data properties. This report describes an adaptable solution to cope with that problem by using the statistical software R [R 2011]. The program compareFinRaw.r is a suitable automatic tool to measure differences of two datasets by computing distances for all relevant variable (column) pairs of the datasets on two levels. Two excerpts of the R-internal dataset Seatbelts [R 2011, Harvey1986] serve as an illustrative data example.
Keywords: data analysis, data comparability, data evaluation, data processing, software