As a startup in a hot technical field, staying in touch with the cutting edge of research is vital. Since I found my way into machine learning via the realisation that the optimisation techniques I’d grown up with in molecular physics were the bread and butter of modern ML, the idea of crashing someone else’s academic conference seemed almost reasonable.
So earlier this year, we had our website translated into Mandarin, booked some cheap flights with horrendous connections, and went along to the International Conference on Machine Learning.
Being in Beijing was an added incentive — Australia has boomed in the last two decades digging up dirt and shipping it to China, could we do the same with tech? Turned out the subway was much easier for a newcomer than I’d expected — I only discovered the English signage after spending half an hour committing to memory the characters for my hotel’s stop!
The conference was big – about 700 foreigners and 500 local participants. The talks were intense, 20 minutes each with questions from world experts awaiting the speakers. But people were friendly and surprisingly accepting of a lapsed quantum mechanic trying to wing it in their field. I unexpectedly found myself sitting next to one my heroes, a postdoc from Cambridge whose papers I’d spent weeks poring over the year before. John Langford, the man behind the under-appreciated Vowpal Wabbit, was almost identical to the person I’d imagined from his writings.
There was an amusing scene one evening watching the World Cup with some European blokes at an outdoor bar next to the venue. This extroverted guy was expounding on deep learning and I asked pointedly whether he could help me if I didn’t have millions of samples of well structured data (there have been several presentations on breakthroughs in voice and image analysis using deep nets). We got into a bit of a technical discussion about optimisation techniques and he kind of admitted that maybe deep learning wasn’t the answer to everything, but asserted it was the future. The next day I discovered I’d picked a fight with one of the authors of Word2Vec, one of the biggest developments of 2013. Beer and jetlag are my only defence.
Deep learning was definitely the dominant theme of ICML 2014. Andrew Ng was mobbed like a rockstar when he first arrived – you don’t realise from his Stanford online courses quite how tall he is – and every deep learning session was standing room only. At the conference dinner, he gave an impassioned pitch as to why deep learning was the future and how Baidu Research was the place to do it.
The big idea that was being discussed was combining deep networks, say one trained on images with one trained on text, to produce a system that could model or learn something resembling the semantic content of images. So it was both shocking and unsurprising to see the Neural Image Caption Generator when it was published six months later.
The other mind blowing announcement was the introduction of realtime translation by Skype, which is finally being rolled out.
Meanwhile Gaussian processes are closer to my heart than neural nets, simply because I started out solving forecasting problems on smallish data
sets and the maths is both elegant fairly easy. A nice application was presented by Jasper Snoek and friends from Harvard, using Gaussian processes for function optimisation, such as maximising the likelihood for a learning algorithm. The story goes that they discovered Netflix was using the results of one of their papers and so they decided to sell this as a service. It’s nice to see Whetlab launched with full functionality and some impressive results.
The other cool technical breakthrough I discovered at ICML was “tensor factorisation”. Very roughly, methods like non-negative matrix factorisation (NMF) which have been so successful in topic modelling and other problems are effectively working with second moments of the underlying distribution. To make use of information in the higher moments (and there are many reasons why second moments are insufficient), you end up constructing higher-dimensional analogues of matrices, i.e. tensors. The technical difficulties begin with visualising these objects (cubes of frequency counts anyone?) and extend through to the discovery that your usual linear algebra tricks do not all apply. For example, the notion of “diagonalising” a tensor is somewhat more complicated than for a matrix. Animashree Anandkumar gave an inspirational workshop presentation on orthogonal decompositions of tensors using a system developed at Microsoft Research called REEF. Deep learning might not be the end of the game after all.