AlphaGo under a Magnifying Glass
Deep Learning program AlphaGo was developed by the Google DeepMind team specifically to play the board game Go. DeepMind was founded in London in 2010 by Demis Hassabis, Shane Legg and Mustafa Suleyman. The mission of DeepMind is to solve intelligence and use it to make the world a better place (as a real advertisement slogan). DeepMind was at the time supported by the most iconic tech entrepreneurs and investors who had a clear-cut vision of the future, prior to being acquired by Google early 2014: their biggest European acquisition ever of $400 million.
Under a small magnifying glass
When finishing the game from a given go-position, the fast policy version chooses rapidly the 'best' possible move for both parties. This efficient 'rollout policy' has been built and trained in the same manner as the basic policy version but is specifically designed to be much faster (about 1000x faster).
Until now, self-contained MCTS has been one of the most applied techniques to Go programs but often turned out to be too computation intensive and inaccurate to win against a go-professional in an even game. Also, because of these important disadvantages of MCTS, experts thought it would take at least another decade before a computer program would be able to defeat a go-prof in a no-handicap game on a 19x19 board.
By nifty combination of the three different versions of the 'policy network' (basic, fast, and improved policy versions) with both the 'value network' (evaluation version) and real-time Monte Carlo Tree Search rollout computations, AlphaGo is able to come up with the 'best' move that provides the highest probability of winning the game in an ultra efficient, fast, and reliable manner.
Under a large magnifying glass
Moreover, every network layer acts as a filter for the presence of specific features or patterns. For detection by such a filter it is irrelevant where exactly in the original image a specific feature or pattern is present: the filters are especially designed to detect whether or not the image does contain any such characteristics. The filter is shifted multiple times and applied at different image positions until the entire image has been covered in detail (the filter can correct, if necessary, e.g. for scale, translation, rotation angle, color, transformation, opacity, out of focus, deviations of specific features present in the original image).
Actually, the DeepMind team has employed a network architecture similar to convoluational neural nets when designing AlphaGo. Herewith, the board-position is represented by an image of 19x19 board points and offered as input after which the convolutional network layers subsequently construct a handy, compact, and precise representation of that position. Next, these layers can actually function as powerful filters in every go-position, for any specific pattern (or feature of the position) you want to detect locally and, for instance, be used for detailed classification, analysis or review of the board-position.
When processing increasingly smaller portions of the go-position image, the specific features of each network layer can be combined in order to distill an increasingly detailed and distinctive representation of the original go-position.
By repeating this process, very specific and detailed features can be used for classification of a go-position (such as eye shape or number of liberties) in order to assign the position to multiple, ultra specific categories extremely accurately.
In practice, the special and incredible power of a convolutional network is that, by smart combination of different feature (filter) maps, a go-position can be categorized and reviewed very specifically on the basis of hundreds (if not thousands) of features and patterns simultaneously. In this manner, any go-position can be classified ultra fast and precise by first crude and clever characterization in terms of e.g. influence, corner fight, life-and-death problem, and then detailed designation of increasingly smaller characteristics of the position.
An important advantage of such convolutional networks is their almost complete independence on any prior knowledge and/or human feature design efforts in order to ultimately learn and assign tens of thousands, ultra-detailed features of a go-position. Such networks require relatively little pre-processing of an input board-position: the network is perfectly able to learn (sometimes millions of) these ultra-distinctive feature filters entirely by itself, autonomously.
Each layer of the network has thereby its own task of detecting some in advance, handcrafted features of the position, for instance to keep track of the number of liberties (atari), ladders, or a 'pattern book' of all possible 3x3 patterns. By back-propagation of the end result of the game to each position, AlphaGo has learned to assign win potentials to every possible move in a position and to select promising moves in a fast and efficient manner (see above: Under a small magnifying glass).
Schematically, this image just illustrates the basic ingredients and function of the convolutional neural networks of AlphaGo: in a nutshell, the various layers convolve (filtering and encryption by transformation) the original 19 x 19 go-position by detection of increaslingly smaller and more abstract go-features.
In the example diagram, the go-position may be classified by the network as: border fight, attack, center ko, nobi, split move. In practice, the end classification by a convolutional network may contain sometimes thousands (or even tens of thousands) filters and features that may be present in the original data, such as the go-position in this example.
Both neural networks have as input 48 feature planes of the 19 x 19 board-points image (which in this diagram are represented by just one single image) and are configured specifically to optimize the efficiency and accuracy of their task. The 'policy' network of AlphaGo during the Fan Hui match used 13 layers and 192 filters, the 'value' network 15 layers (whereas layers 2 - 11 are identical to those of the 'policy' network).
With the last layer, the convolutional network classifies the go-position by a precise combination of all specific go-features detected. In exactly the same manner, new and never seen before go-positions can be recognized and classified accurately after the deep learning neural network has been trained thoroughly.
Where a human is able to play at most 10,000 serious games in a lifetime, AlphaGo plays hundreds of millions of games in a couple of months: training a new version of AlphaGo from scratch just takes about 4 – 6 weeks. Thereafter, a phenomenally, power-trained AlphaGo is able to compute and judge accurately tens of thousands go-positions per second (for the distributed version of AlphaGo running on multiple, first-class computers).
[Part 3: Lee Sedol about the Go board in his head]