• How did a computer algorithm like Google’s AlphaZero manage to learn, master and then dominate the game of chess in just four hours?
• AlphaZero’s mastery of chess stemmed from the sheer, brute force of Google’s AI-specific Tensorflow processing units (TPUs) – 5,000 of them to be exact.
“How about a nice game of chess?” With that iconic line of dialog from what is one of my favorite films, the 1983 cold war sci-fi thriller WarGames, nuclear war was narrowly averted by a machine (named Joshua) capable of teaching itself how to play a game. This week another machine, one of Google’s DeepMind AI offspring, AlphaZero, did something similar in that it took four hours to teach itself how to play chess and then proceeded to demolish the best, highest rated chess computer, Stockfish. After 100 games, AlphaZero racked up 28 wins and zero losses. So much for more than a millenium of human effort in teaching a computer how to play chess. But how was this possible? Was this a fair match? How did a computer algorithm like AlphaZero manage to learn, master and then dominate the game of chess in just four hours?
Spoiler alert: it’s all about the chips. But more on that in a moment. First, let’s start with a quick rundown of some interesting parameters from this truly historic tournament:
How did AlphaZero learn chess? Google’s algorithm used self-play reinforcement learning, starting at a chess rating of ten and took 700,000 iterative training steps over four hours before taking on Stockfish. During its training phase, the algorithm had no access to opening books or endgame tables. It simply played a large number of iterative games against itself.
Where did they play? This training session ran on 5,000 first-generation Google TPUs to generate the self-played games. It also used 64 second-generation TPUs to train the neural networks for those games. During the match, AlphaZero ran on a single machine with four TPUs. Stockfish ran on a single machine with 64 threads and a hash size of 1GB.
How did they play? During the 100-game closed-room tournament, both algorithms were given one minute per move. During play, Stockfish searched 70 million positions per second, while AlphaZero searched only 80,000 in the same time period.
What was the actual score? The full result of the 100 game match gave AlphaZero 28 wins and zero losses, but 72 draws. Of those 28 wins, 25 came as white and only three as black.
What was the best opening? Interestingly, the Queen’s Gambit (1. D06) became the top opening for AlphaZero, case closed. Apparently we can all just forget about the French, Sicilian and King’s Indian Defense.
What can we learn from this? First and foremost, and in defense of humankind, AlphaZero didn’t completely demolish Stockfish. A match with 72 draws and only three wins as black is not that far outside of what you might see with two human combatants. Moreover, Stockfish had already been beaten by another computer algorithm – Komodo – earlier in the year. And because AlphaZero made up (discovered?) its own opening book, it could be argued that this really wasn’t a fair fight as Stockfish could not make use of its considerable opening preparation. Ironically, it should be noted that one of the strengths of our current World Chess Champion, Magnus Carlsen, is his propensity to toss out the opening book and force his opponents to try to beat him positionally.
In short, I think we humans deserve a rematch of our best human-trained machine versus Google’s best machine-trained machine.
But that’s not the most important takeaway here. The answer to the question of how AlphaZero was able to achieve mastery of chess in only four hours isn’t down to the unusual brilliance of its engineers or the superiority of Google’s TensorFlow machine learning (ML) and deep learning (DL) model building framework. Those elements explain how it beat Stockfish using only 80,000 positional evaluations per second. Rather, AlphaZero’s mastery of chess stemmed from the sheer, brute force of Google’s AI-specific TPUs – 5,000 of them to be exact.
You see, the real magic behind this achievement is the speed with which AlphaZero (and by extension any AI algorithm) can iterate and refine its neural network and learning models. Google estimates that each TPU is capable of delivering up to 225,000 predictions per second. A regular old CPU can muster just over 5,000. That’s a lot of potential outcomes to consider.
It is this hardware-driven ability to iteratively learn at speed that unlocks the door to AI’s potential. That is why not just Google, but Intel, NVIDIA, SAMSUNG, Apple, et al. are investing so heavily in AI-specific chips across both servers and personal devices. And that’s where we’ll see the most innovation and competition over the coming year as vendors speed up AI through purpose-built hardware. As War Games’ Joshua would say, “Shall we play a game?”.