Baidu ImageNet Result Part of a Much Larger Picture

Deep Learning has been in our palms for a while now, and we all get to see the results from the requests we send to the tech companies we rely on for kicks and flicks which means the exodus of AI experts from academia to the largest commercial research laboratories is well under way.

The momentum which has been generated over recent years has meant that Joe Public with an interest in Deep Learning usually doesn’t have to wait long before a story emerges about how one of the big Deep Learning shops has smashed new territory in bring order to the chaos of unstructured data.

The Wall Street Journal gave some insight today into how Baidu‘s Supercomputer Minwa has posted the world’s best results to date for image recognition. Minwa is a custom built system used for image recognition at Baidu which uses 72 Intel Xeon E5-2620 processors and 144 Nvidia Tesla K40m GPUs giving it the ability to process around 0.6 quadrillion floating point operations per second.

This piece of kit has allowed Baidu to outperform Google with indexing, retrieving, organising and annotating images from the ImageNet image dataset, a project which was put together to provide wholesale supply of images for precisely this type of task. WSJ report –

With practice, humans correctly identify all but about 5 percent of the ImageNet photos. Microsoft’s software had a 4.94% error rate;  Google achieved 4.8%. Baidu said that it had reduced the error rate further to 4.58%.

Google’s entry in the 2014 ImageNet competition saw them win the challenge but I would like to know more about the distributed hardware they used for this (I had no success with Google but if anyone can shed light, please tell all).

The playing fields could not only be different with the equipment used but I also don’t believe the secret weapons are necessarily the supercomputers themselves. Back in January Facebook AI Research open sourced the work it has been doing to improve the performance of convolutional neural networks under Torch, a collaborative scientific computing framework which supports machine learning algorithms, since when there have been two further iterations of the paper which was published with this work and was submitted for the International Conference on Learning Representations in December 2014.

On top of using Minwa for performance, Baidu would have had to create the algorithms to achieve the results and in the case of Facebook, they were able to achieve better performance with CNNs for their specific domain by customising an algorithm called fast fourier transform (FFT) which is used for mapping networks and representing complex functions as more easily digestible coefficients.

The Facebook AI Research implementation of FFT known as fbFFT achieves 1.5x speedup over NVIDIA’s CuFFT library which in turn computes 10x faster than standard FFTs and it is a matter of time before Facebook AI say they will release faster implementations as they become available accompanied by reduced training time.

Baidu’s hugely impressive results are exciting but I always wonder how long we’ll wait to see Baidu outperform themselves or be outperformed on their latest efforts through product performance reports. 

As the WSJ noted Yann Le Cunn from Facebook AI Research comment, the ImageNet test is starting to become ‘passe as a benchmark.’



About Gary Donovan

Machine Learning and Data Science blogger, hacker, consultant living in Melbourne, Australia. Passionate about the people and communities that drive forward the evolution of technology.
Show Buttons
Share On Facebook
Share On Twitter
Share On Linkedin
Share On Pinterest
Share On Stumbleupon
Contact us
Hide Buttons