Machine Learning at Spotify, Madison Big Data Meetup, Andy Sloane

Andy Sloane from Spotify spoke at the Madison Big Data Meetup January 27th about how Machine Learning helps the music service make recommendations to listeners for related artists and radio.

Spotify’s recommendable catalogue comprises more than 4 million tracks with each ‘play event’ registered once a song has finished playing.  In this talk Sloane explains the discover page which is rebuilt daily from data processed through Hadoop , dropped into Cassandra and served to the user (go to 6:30)

Spotify use Collaborative Filtering for automated suggestions where, very broadly, one user is likely to share the same tastes as a person who has liked the same music (go to 8:00) and Sloane describes how recommendations are made using vectors and implicit matrix factorization. For this chunk of the talk you’ll be wanting to wear your math boots.

One problem faced by Spotify is economically being able to map item vectors to item vectors (go to 21:25) so vectors are split up into chunks with a matrix and this section describes how all the vectors are put into DistributedCache in Hadoop with Luigi. For recommendations they use best fit local suggestions (locality sensitive hashing) rather than sifting through the entire catalogue using a library called annoy (GitHub).

Great demonstrations in this talk too for those that have not used Spotify where Sloane shows vectors working and making recommendations and the end of the talk from about 40 minutes onward drift into more of a Q&A and talk about some of the limitations such as unintentional false user input creating crooked recommendations!




About Gary Donovan

Machine Learning and Data Science blogger, hacker, consultant living in Melbourne, Australia. Passionate about the people and communities that drive forward the evolution of technology.
Show Buttons
Share On Facebook
Share On Twitter
Share On Linkedin
Share On Pinterest
Share On Stumbleupon
Contact us
Hide Buttons