MPIDR Technical Report

compareFinRaw.r – an R program to measure the difference between datasets

Walke, R., Müller, A.
MPIDR Technical Report TR-2012-003, 7 pages.
Rostock, Max Planck Institute for Demographic Research (July 2012)
"tr-2012-003-files.zip" contains the R script, a Sweave script, the related control file, and a folder holding all example output files ("Output_ready").

Abstract

In every data-related research it is essential to have knowledge about potential disparity of the data in use. There might be differences between modification stages of a single dataset or between distinct datasets. Either way the researcher has to be aware of these differences in order to draw proper conclusions that might be affected by different data properties. This report describes an adaptable solution to cope with that problem by using the statistical software R [R 2011]. The program compareFinRaw.r is a suitable automatic tool to measure differences of two datasets by computing distances for all relevant variable (column) pairs of the datasets on two levels. Two excerpts of the R-internal dataset Seatbelts [R 2011, Harvey1986] serve as an illustrative data example.
Keywords: data analysis, data comparability, data evaluation, data processing, software
The Max Planck Institute for Demographic Research (MPIDR) in Rostock is one of the leading demographic research centers in the world. It's part of the Max Planck Society, the internationally renowned German research society.