Humans: Both the Problem and the Solution

Steven Hill
Summary Bullets:

  • Increasing automation in the data center can be one of the best ways to reduce errors in a dynamic production environment.
  • Automation can also be a source for problems of a much greater scale because of the number of processes that can affected by errors within a large and complex environment.

It’s highly unlikely that American sociologist Robert Merton was thinking about cloud computing when he proposed his “Law of Unintended Consequences” in 1936, but it seems particularly apt in light of Microsoft’s revelations regarding the major Azure cloud storage outage of November 2014. Just this week, Microsoft released its root cause analysis that pointed to simple human error as the cause of the 11-hour storage outage that also took down any associated VMs, some of which took more than a day to get back online. Now I’m not here to pile on Microsoft; its response in fixing such a massive system crash can’t really be faulted. What does interest me is how vulnerable our complex and automated systems can still be after years of automation designed to remove human error from the equation. Continue reading “Humans: Both the Problem and the Solution”