Netflix invests considerable resources into Deep Learning in order to create a service which gives users what they want, ideally before they know it. That is to say modestly, using resources available to them through AWS without needing an entire data centre full of dedicated machines in order to create a Neural Network.
For these networks multiple models will usually be used from different datasets across a distributed network and for each configuration Netflix perform hyperparameter optimisation meaning each combination of parameters aims to measure the algorithm’s performance on an independent dataset using GPU-based parallelism.
Xavier Amatriain guided Netflix through a considerable programme of research to make sure the most appropriate and efficient techniques were being used in order to personalise the Netflix product for users. In this Slideshare, he talks about some of the lessons they learned from building ML systems and concludes how choosing the right metric and understanding dependencies between data and models are vital.
In summary Amatriain’s ten lessons from building machine learning systems are –
1. More data
vs & better models – challenging the notion that more data gives you better models and that a certain amount of data will allow a test to reach it’s elastic limit at a certain point.
2. You might not need all of your big data – being economical with big data can sometimes produce better results if the sample data quality is hand picked.
3. The fact that a more complex model does not improve things does not mean you don’t need one – be prepared that if a model is more complex it may not show up improvements in a test where the sample set is too simple.
4. Be thoughtful about your training data – unforeseen behaviours for training data and contraindications such as movies being abandoned but rated or finished watching yet poorly rated.
5. Learn to deal with (The curse of) presentation bias – data is as a result of the films which Netflix have decided to show, which is a result of what the model is predicting as good, there could also be a bias on choice from positioning on the UI.
6. The UI is the algorithm’s only communication channel with that which matter most: the users – the UI is the sole channel for generating data for algorithms which means that a change of the UI may require a change in the algorithm.
7. Data and models are great. You know what’s even better? The right evaluation approach – measuring differences in metrics across statistically identical populations that each experience a different algorithm.
8. Distributing algorithms is important, but knowing at what level to do it is even more important – knowing when appropriate to deploy which algorithm across different populations, for combinations of hyperparameters and subsets of training data.
9. It pays off to be smart about choosing your hyperparameters – automate hyperparameter optimisation by choosing the right metric.
10. There are things you can do offline and there are things you can’t.. and there is nearline for everything in between – on-line like computations which are not required to be served in real-time rather than stale batch data for off-line and complex computation for real time.