Spotify’s recommendable catalogue comprises more than 4 million tracks with each ‘play event’ registered once a song has finished playing. In this talk Sloane explains the discover page which is rebuilt daily from data processed through Hadoop , dropped into Cassandra and served to the user (go to 6:30)
Spotify use Collaborative Filtering for automated suggestions where, very broadly, one user is likely to share the same tastes as a person who has liked the same music (go to 8:00) and Sloane describes how recommendations are made using vectors and implicit matrix factorization. For this chunk of the talk you’ll be wanting to wear your math boots.
One problem faced by Spotify is economically being able to map item vectors to item vectors (go to 21:25) so vectors are split up into chunks with a matrix and this section describes how all the vectors are put into DistributedCache in Hadoop with Luigi. For recommendations they use best fit local suggestions (locality sensitive hashing) rather than sifting through the entire catalogue using a library called annoy (GitHub).
Great demonstrations in this talk too for those that have not used Spotify where Sloane shows vectors working and making recommendations and the end of the talk from about 40 minutes onward drift into more of a Q&A and talk about some of the limitations such as unintentional false user input creating crooked recommendations!