The promise and the pitfalls of harnessing big data

Management consultants are busy using big data to drum up business but issues are begining to arise that concern access, privacy, analysis and interpretation.

shutterstock_151466624.jpg

Photo: Shutterstock

​Maths, stats and econometrics are suddenly sexy subjects – and well paid. Careercast says the best job of 2016 is a data scientist, paying $170,000 a year. It combines information technology, statistical analysis and interdisciplinary skills to interpret trends from data.

Google has shown there's money in mining big data. It became the most valuable company in the world by facilitating a billion data searches a day – and sticking a personalised ad on the results.

And at UNSW, the schools of mathematics and statistics, computer science and engineering, and UNSW Business School, will join forces to offer an interdisciplinary degree – a bachelor of data science and decisions – in 2017. It's an expanding field.

"The amount of created data is growing exponentially and is already exceeding the amount of available storage, which creates challenges and also great opportunities," says Valentyn Panchenko, an associate professor of economics at UNSW Business School and convenor of next month's UNSW Business School Roundtable: Big Data in Business and Research.

"Biggish data has been around for a long time, such as retail scanning data, loyalty programs, bank, tax and health records, but what is different now is the proliferation from new data sources: internet searches, Facebook, Twitter, wearable measurement apps and phone location data. Linking these various data-sources becomes possible and opens up new opportunities," Panchenko says.

Information explosion

An awareness of big data dates back to the 1940s when the term 'information explosion' first appeared, but it has been in this century, with the rapid advances in computing power and digital technologies, that it has come into the business mainstream.

There was early optimism that simply matching up the sheer volume of data would throw up explanations in a new era of understanding the world.

"Correlation supersedes causation, and science can advance even without coherent models, unified theories, or really any mechanistic explanation at all," claimed Wired magazine's Chris Anderson in 2008.

Another notable point was The New York Times report in 2012 of how the Target department store decided a teenager was pregnant – before she had told her family. Target sent her maternity catalogues, based on an analysis of her shopping (and set against purchasing trends derived from big data) which caused her father to complain to Target.

He later famously apologised: "It turns out there's been some activities in my house I haven't been completely aware of. She's due in August."

It showed how mathematicians, statisticians, predictive analysts, economists, psychologists and neurologists were mining big data on shopping patterns – and exploiting insights such as habits, rather than conscious decision-making, shape 45% of the choices we make daily.

Google told Nature magazine in 2008 it could predict 'flu trends by processing hundreds of billions of web searches about 'flu across five years. Initially successful, in 2013 Google's predictions were wildly wrong – double the real rate as reported by doctors.

There followed a backlash and warnings about the traps in big data analysis. The Economist labelled this the classic hype cycle, in which a technology's early proponents make grandiose claims but under-deliver, yet the technology eventually transforms the world, as did the internet.

Spurious correlations

The technology is maturing but Panchenko notes there are big data issues with storage/access, privacy, analysis and interpretation.

"The key methodological challenge is how to combine traditional fundamental models, which reflect causal relationships and typically use well-structured data, with methods currently used for big data, such as machine-learning methods, which are great for finding correlation patterns," Panchenko says.

Economists and statisticians know big numbers can throw up some spurious correlations, as a finding posted on The Economist website hilariously illustrates: There is a 95% correlation between per capita cheese consumption and people being strangled in their bedsheets. Yet causal relations, where a change in one variable causes a change in the other, cannot be found in big data without an explanation that can be tested.

Management consultants are busy using big data to drum up business. A recent PwC report claimed one-third of Australian businesses are embracing data – compared with one-half internationally. 

The Australian Bureau of Statistics is trying to make more data available for researchers, and government departments are interested in using big data to improve efficiencies, as envisaged by the 2013 Australian Public Service Big Data Strategy.

According to Stuart Black, expert on data analytics at management consultant Deloitte, it is still early days.

"My role is to work with 600 Deloitte partners to put data into their offerings to improve company efficiency, such as with the Australian Food and Grocery Council's index we developed based on the demand for Chep pallets that predicts retail trade trends three months in advance. In the lead up to Christmas you can see what will be on short supply because of what has been shipped around Australia on Chep pallets," Black says.

Predictive problems

The theoretical problems around using past big data to predict the future concern econometrician Denzil Fiebig, an economics professor at UNSW Business School, who will be speaking at the big data roundtable.

"Companies can do market segment studies to a fine level, and big data is fantastic for that," Fiebig says.

"Yet data mining and machine algorithms learning about interesting patterns in data that happened in the past can strike problems answering 'what if' questions. What if I change some part of the business, such as the way I deal with my customer base, prices, incentives or the way patients deal with the health system, such as by charging them?

"It is much more difficult to infer what will happen then. With machine learning, answers do not fall straight out of the data. Just because you have more data, it does not mean it is better or more suited to answering 'what if' questions."

This article was originally published on UNSW's Business Think. Read the entire piece online.