June 25, 2012 | Press Release
You are where you e-mail: Global migration trends discovered in e-mail data
U.S. emigration made visible for the first time.
For the first time comparable migration data is available for almost every country of the world. To date, records were incompatible between nations and especially by gender and age, nonexistent. Emilio Zagheni from the Max Planck Institute for Demographic Research (MPIDR) in Rostock, Germany, for the first time provides a rich migration database by compiling the global flow of millions of e-mails.
It's true for old letter boxes as well as modern internet mailboxes: they indicate where the sender of messages lives. This information can be used to compile migration data.
© Photo: fult / photocase.com
Rostock, Germany. "Where estimates of demographic flows exist, they are often outdated and largely inconsistent,” says MPIDR researcher Emilio Zagheni. Official records are difficult to use for various reasons. Emigrants tend not to register after they move to a new country or do so very late. There is also no clear agreement between nations on how to actually define a migrant.
Official migration data is outdated and inconsistent
“Global internet data does not have these drawbacks,” says Zagheni. “You are where you e-mail.” Together with Ingmar Weber from Yahoo! Research he traced e-mails sent from Yahoo! accounts around the world to infer the residence of its sender. Every device which sends e-mail can be located at least at the country level by an internationally standardized code, the so-called IP address. Zagheni and Weber analyzed the countries derived from IP addresses for a set of messages sent by 43 million anonymous Yahoo! account holders between September 2009 and June 2011.
In addition to the date and geographical origin of each message they compiled the self-reported birthday and gender of the sender. When a person started sending e-mail from a new location permanently, it was assumed that he or she had changed residence. This way they were able to calculate rates of migration from and to almost every country in the world. Only anonymized data was used, so identifying individuals was impossible and no information about the recipients, the subject, or content of a message was accessed. The findings have now been published in the ACM Web Science Conference Proceedings.
The results not only are a proof of concept. They also reveal international migration characteristics never seen before. For the USA, Zagheni and Weber were able to produce the first curve of emigration by age and sex ever. “In the U.S. many statistics are collected about people who move into the country, but there is no system that keeps track of people who move out,” says Emilio Zagheni.
U.S. Emigration unveiled: by analyzing millions of e-mails the first consistent figure of those emigrating from the USA was made possible. The curves show those who sent most of their e-mails from the U.S. between September 2009 to June 2010 but consistently wrote the majority of their messages from abroad between July 2010 and June 2011. Download picture in high resolution (JPG File, 260 kB)
The potential of the e-mail statistics goes far beyond calculating gross country profiles. For instance, the researchers also looked into Mexico-US cross-border mobility. The data reveals how strongly both countries are demographically integrated: most people who moved from Mexico to the United States either spent time in the USA before emigrating north, or went back to visit Mexico soon after moving to the United States. Those in their 30s have the highest rate of mobility across the Mexico-US border, while the least mobile are those 50 and older.
Only the tip of the iceberg
The strength of Zagheni’s and Webers migration data comes not only from the vast number of e-mails available, but also from a mathematical model they set up to adjust for typical shortcomings of e-mail statistics: those who send e-mail are not representative of the entire population. Some groups, like the elderly, use e-mail less or not at all and are thus underrepresented. But the researchers managed to calculate adjustment factors for such groups by gauging their e-mail data against migration numbers from European countries, where official data is fairly reliable.
“What we addressed so far is only the tip of the iceberg,” says Emilio Zagheni. With further fine tuning of the adjustment factors and mining more digital data like twitter messages, more difficult questions could be tackled. For instance one could keep track of the short and long-term mobility patterns before and after a crisis like that of the Japanese Fukushima reactors. Unquestionably, digital records give demographers the chance to gain a more accurate picture of population dynamics in regions they can so far only guess about, says Zagheni.“This research has the most potential in developing countries, where the Internet spreads much faster than registration programs develop.”
About the MPIDR
The Max Planck Institute for Demographic Research (MPIDR) in Rostock investigates the structure and dynamics of populations. The Institute’s researchers explore issues of political relevance, such as demographic change, aging, fertility, and the redistribution of work over the life course, as well as digitization and the use of new data sources for the estimation of migration flows. The MPIDR is one of the largest demographic research bodies in Europe and is a worldwide leader in the study of populations. The Institute is part of the Max Planck Society, the internationally renowned German research organization.
Associated Information for Download
This Press Relase (PDF File, 232 kB)
Figure “U.S. Emigration 2009 - 2011” in high resolution (JPG File, 260 kB)
Zagheni, E. and I. Weber: You are where you e-mail: using e-mail data to estimate international migration rates. Proceedings of the 4th Annual ACM Web Science Conference. Evanston, Illinois, USA (June 22 - 24, 2012), 348-351. DOI:10.1145/2380718.2380764
PRESS AND PUBLIC RELATIONS