Data Science (For Development)


No-one knows when data science began; some would say that it is a recent phenomenon born of a confluence of Moore's Law and Chinese manufacturing in the last decade. Others would argue that industry pioneers such as Walmart and Bloomberg can stake a claim to being the first adopters. Others might go even further back to the statistical linguistic pioneer George Kingsley Zipf whose analysis of word frequency actually predated digital computers leading to the use of students as 'human computers'. The more perverse might even cite the bold explorers of the 19th century, such as the ill fated scientists onboard the Jeanette who painstakingly recorded weather conditions in the Arctic every hour for several years while trapped in ice, later sacrificing their lives in order to preserve the mammoth log books. Yet one thing is unanimous, that this revolution is changing the way that we live and make decisions.

A representation of how native speakers of one language use other languages on social media

In the 13 years since I wrote my first program (in BASIC) to extract the signal from a noisy channel in a badly lit physics laboratory I have developed a deep fascination with data, science and data science. I have spent many countless hours wrangling, munging and scraping raw data, transforming ancient SPSS files into something useful and screwing servers into racks. I have turned my hand to census data from the Swiss census bureau, social media content, Costa Rican postal records, the text of national constitutions and (anonymised) mobile phone records. In that time, the data landscape has been transformed, not just in terms of the three V's of big data, but the ubiquity and expectation of data driven decision making, the democratisation of data science.

Just like most like-minded theoretical scientists coming of age in the early years of the 21st century, I am intrigued by the current golden age of data; how it came to be, the possibilities, the limitations and what these all mean for society and business and the threat it poses to the scientific method. Some of my informal thoughts on the subject can be found below.

Since leaving academia, I have applied data science to problems of development and humanitarian response. The integration of modern data science techniques into low income countries is at once thrilling, challenging and pressing. The UN has called for a data revolution, traditional organisations have begun to recruit data science teams while the Sustainable Development Goals raise astonishing challenges for measurement of human behaviour.