There is every chance that the billions of us providing behavioural data to giants like Google and Facebook might be sleepwalking straight through the single most pivotal period in technological and industrial advancement. We only ever needed to look inside ourselves for the tools, quite literally, through modelling of the brains highly complex and often unpredictable neural networks.
Google are acutely aware of this, which is why they reportedly paid around $400 million for London based Artificial Intelligence startup DeepMind in early 2014, which might just be Google’s most important test laboratory in anticipation of the next and biggest ever revolution in technology, and could be the key to unlocking the power of Google’s Boston Dynamics robots. Co-founder Demis Hassabis dangled an insight into what the group had been up to at last year’s First Day of Tomorrow conference in Paris in which Hassabis showed how unleashing Deep Learning algorithms onto things can produce astonishing results.
Not only had the algorithm which DeepMind created learnt how to play Atari games with absolute zero training but also, astonishingly went on to figure out that in the game Breakout, the most efficient way of racking up rewards was to dig a tunnel to the top of the game to get the ball to break up as much of the wall as quickly as possible for maximum reward.
Fast forward nearly a year on and DeepMind have published a paper in Nature called Human-level Control Through Deep Reinforcement Learning which explains how they set out to create one of the next generation of Deep Learning algorithms. The big difference with the so called deep Q-network (DQN) is that it can be used in a variety of applications, not just a single task and in this case is able to teach itself to play forty nine different Atari games at the level of a professional human games tester and in some cases exceeding human efforts to a ridiculous degree.
Whether by design or by default of Hassabis’s background, the DeepMind team have dismantled the creep towards full artifical intelligence by using the virtual world of gaming as a test bed for these industrious algorithms and skip the hardware failure pain that researchers would otherwise have to endure with using robots to embed and test them on. Algorithms first, hardware second. Google are deconstructing the challenge with the purchase of DeepMind.
In an interview with Google machine learning expert Blaise Agüera y Arcas at WIRED2014 last year, Hassabis explains, “What we’re interested in at DeepMind is only the self learning type of AI. What we work on is what we call general learning algorithms, so the idea is that they learn for themselves how to master tasks directly from sensory experience or inputs, then there is the second word ‘general’ in there, which is this idea that a single system or set of learning algorithms can potentially master a whole wide range of tasks out of the box, without any pre-programming.”
What makes DQN results so much more successful than previous raw data fed reinforcement learning methods is the approach in addressing previous instability problems where small updates to the algorithm through sequences cause changes in the data distribution which can negatively affect the trajectory of the learning path.
By using convolutional neural networks, which allow for overlapping regions, DQN is able to take an iterative approach to learning and update Q as it runs through the network. Extremely close biological similarities are central to the success of DQN and it also uses something called experience replay which remove changes in data distribution and uses reward to continuously shape results.
Whilst Hassabis graduated from Cambridge with a degree in Computer Science, he later went on to obtain a PhD in Cognitive Neuroscience from University College, London (UCL) which likely helped to inject many of the thought patterns into what DeepMind are doing. One of the ways that DQN works is to mimic an attribute of the hippocampus part of the mammalian brain which reactivates recently stored information for decision making and behaviour, something which DeepMind may in future be able to manipulate to highlight and weight more important events to train behaviour.
On top of this, a significant step forward with Q was made in it’s efficiency where the algorithm will skip frames in order to allow for greater computing power. This means, for example, if four frames are skipped, the algorithm will be able to learn four times quicker, with the skipped frames repeating actions four times.
The paper in Nature probably only provides a snap shot of what DeepMind were doing more than a year ago and it is conceivable that the algorithm used to master retro games may now be slightly retro itself, at least by DeepMind standards. The question is, where are they up to now? The Nature release may have just showed us what was happening inside DeepMind a year ago but since the Google acquisition, DeepMind have recruited heavily in order to advance beyond DQN towards 3D and more complex games and simulators.
If Google are working towards fusing Boston Dynamics efforts for building robots with DeepMind algorithms for allowing robots to learn across a wide array of tasks using the lowest number of algorithms possible, a release on how the formation of an AI Ethics Board is being planned is equally important to the wider community as the release in Nature. This was part of the conditions of the deal of DeepMind’s acquisition by Google. At least it might help allay any fears of the abuse of such technology, important in any era when rogue humans still exist.