Big Data and Predictive Analytics Need More People, Not More Data
September 17, 2013 Leave a comment
- Machine learning, data mining, and advanced analytics coupled with big data seems poised to reshape the way we make business decisions, automating them and making them more effective.
- However, if our early work with stocks, recruitment and credit scoring are any indications, our algorithmic innovations need human oversight now more than ever.
My Google Nexus tablet knows when I should leave the house in order to keep my daily routine humming along smoothly. Using historical geolocation data, search results and email trafficit knows, for instance, where I like to dine and how long it will take me to get there at the prescribed hour on the customary day of the week.
That’s both terrifying and thrilling all at the same time. I want my personal computing device to understand me, my habits and my preferences. But I also understand that its output is a best guess at best, an approximate calculation drawn from the sea of data I generate each and every day. And I understand that in order for my tablet to reach these conclusions, I need to give it full access to that data. I also need to take its prescriptions with a considerable grain (if not shaker) of salt.
Although Google Now is smart enough to alter its predictions when I’m on the road, it’s not always accurate in its assumptions regarding what I like to do while on travel in terms of dining, meeting with customers or even making flights. There are always exceptions to the rule.
It is the same with big data and predictive analytics. With machine learning, access to substantial amounts of data in motion (not just at rest) and the right algorithm, we should be able to effectively predict when to stand up a new server farm in order to meet a coming tsunami of user activity, for example. Better to risk the potential of lost revenue by anticipating an expected outcome rather than losing that or more revenue in responding to that outcome after the fact, right?
I’m not so sure. At least, I’m not sure we should blindly put our faith in predictive analytics, even if its underlying algorithm has been thoroughly vetted in the field, or even if it has access to a substantial amount of current and historical data from a wide array of sources. The lessons taught during the 2010 Flash Crash and the numerous overly optimistic initial public offering predictions since then prove that the best of intentions and the smartest technologies are not enough to ensure success.
What we need is oversight. What we need are people inserted into the process of predictive analytics to provide a means of check and balance, a separation of powers to at least slow down what could become a runaway train, should our carefully crafted algorithms fail to account for the unaccountable. That’s why I’m glad most business intelligence vendors are adopting enterprise social networking features such as tagging, commenting, rating, voting, etc., building those into the process of modeling of building out such complex predictive algorithms.
Such capabilities will provide an effective front line of defense against potentially harmful algorithms. Now if only we could build such oversight into our models as they run in the wild. As with Google Now, which builds in user feedback, if enough users point out the exceptions to the rule, the more effective those rules become.