How Spotify ran the largest Google Dataflow job ever for Wrapped 2019

Frederic Lardinois

18 February 2020 at 12:30 pm·4-min read

LOS ANGELES, CALIFORNIA - JANUARY 23: (L-R) Billie Eilish and Finneas O'Connell perform onstage during Spotify Hosts "Best New Artist" Party at The Lot Studios on January 23, 2020 in Los Angeles, California. (Photo by Frazer Harrison/Getty Images for Spotify)

In early December, Spotify launched its annual personalized Wrapped playlist with its users' most-streamed sounds of 2019. That has become a bit of a tradition and isn't necessarily anything new, but for 2019, it also gave users a look back at how they used Spotify over the last decade. Because this was quite a large job, Spotify gave us a bit of a look under the covers of how it generated these lists for its ever-growing number of free and paid subscribers.

It's no secret that Spotify is a big Google Cloud Platform user. Back in 2016, the music streaming service publicly said that it was going to move to Google Cloud, after all, and in 2018, it disclosed that it would spend at least $450 million on its Google Cloud infrastructure in the following three years.

It was also back in 2018, for that year's Wrapped, that Spotify ran the largest Google Cloud Dataflow job ever run on the platform, a service the company started experimenting with a few years earlier. "Back in 2015, we built and open-sourced a big data processing Scala API for Apache Beam and Google Cloud Dataflow called Scio," Spotify's VP of Engineering Tyson Singer told me. "We chose Dataflow over Dataproc because it scales with less operational overhead and Dataflow fit with our expected needs for streaming processing. Now we have a great open-source toolset designed and optimized for Dataflow, which in addition to being used by most internal teams, is also used outside of Spotify."

For Wrapped 2019, which includes the annual and decadal lists, Spotify ran a job that was five times larger than in 2018 -- but it did so at three-quarters of the cost. Singer attributes this to his team's familiarity with the platform. "With this type of global scale, complexity is a natural consequence. By working closely with Google Cloud’s engineering teams and specialists and drawing learnings from previous years, we were able to run one of the most sophisticated Dataflow jobs ever written."

Still, even with this expertise, the team couldn't just iterate on the full data set as it figured out how to best analyze the data and use it to tell the most interesting stories to its users. "Our jobs to process this would be large and complex; we needed to decouple the complexity and processing in order to not overwhelm Google Cloud Dataflow," Singer said. "This meant that we had to get more creative when it came to going from idea, to data analysis, to producing unique stories per user, and we would have to scale this in time and at or below cost. If we weren’t careful, we risked being wasteful with resources and slowing down downstream teams."

To handle this workload, Spotify not only split its internal teams into three groups (data processing, client-facing and design, and backend systems), but also split the data processing jobs into smaller pieces. That marked a very different approach for the team. "Last year Spotify had one huge job that used a specific feature within Dataflow called "Shuffle." The idea here was that having a lot of data, we needed to sort through it, in order to understand who did what. While this is quite powerful, it can be costly if you have large amounts of data."

This year, the company's engineers minimized the use of Shuffle by using Google Cloud's Bigtable as an intermediate storage layer. "Bigtable was used as a remediation tool between Dataflow jobs in order for them to process and store more data in a parallel way, rather than the need to always regroup the data," said Singer. "By breaking down our Dataflow jobs into smaller components -- and reusing core functionality -- we were able to speed up our jobs and make them more resilient."

Singer attributes at least a part of the cost savings to this technique of using Bigtable, but he also noted that the team decomposed the problem into data collection, aggregation and data transformation jobs, which it then split into multiple separate jobs. "This way, we were not only able to process more data in parallel, but be more selective about which jobs to rerun, keeping our costs down."

Many of the techniques the engineers on Singer's teams developed are currently in use across Spotify. "The great thing about how Wrapped works is that we are able to build out more tools to understand a user, while building a great product for them," he said. "Our specialized techniques and expertise of Scio, Dataflow and big data processing, in general, is widely used to power Spotify’s portfolio of products."

Yahoo News Australia
Bizarre Westfield car park scene baffles Aussies: 'Menace to society'
Shoppers got a surprise when arriving at the car park to discover the spaces were taken, but not by cars. Find out what happened.
3 hours ago
The Daily Beast
Trump Bruised by Another Brutal Haley Protest Vote in Pennsylvania Primary
Curtis Means/ReutersDonald Trump may well have had the Republican nomination in the bag for weeks now, but legions of GOP voters are still apparently unable to stomach the idea of casting their ballots for him.The former president was given another reminder of the scale of his problem on Tuesday with the Pennsylvania primary. Nikki Haley, who axed her own campaign over a month ago, managed to take 16.5 percent of the vote.Nikki Haley Nabs More Votes Than Ron DeSantis in FLORIDATrump, the only ca
17 hours ago
HuffPost
How Toxic Is Trump? Republican Group's Hidden Camera Reveals Uncomfortable Truth.
The former president's behavior just doesn't fly out in the real world.
2 days ago
Parade
Demi Moore's Daughter Rumer Willis Rocks Tan Bikini to Flaunt Her 'Mama Curves in the Jungle'
The actress is a new mom to baby girl, Louetta.
2 days ago
Yahoo Lifestyle
Farmer Wants A Wife's Tom reveals surprising behind the scenes fact: 'Uninterested'
EXCLUSIVE: Farmer Wants A Wife star Farmer Tom has told Yahoo Lifestyle a little-known fact from behind the scenes of the show. Read more.
23 hours ago
Cosmo
Anitta coordinates her teeny bikini with her… fridge?
Anitta shared a series of pics on IG posing in a teeny tiny green string bikini with yellow trim perfectly coordinated to her Smeg fridge and orange juice.
19 hours ago
The Daily Beast
How Putin’s Whirlwind Bromance Could End in a Kremlin Tragedy
Sputnik/Alexei Nikolsky/Kremlin via ReutersThe Kremlin is reportedly scrambling to find a successor to Ramzan Kadyrov following reports that the Chechen leader has been diagnosed with necrotizing pancreatitis, a terminal illness, according to Russian media reports.Kadyrov, also known as “Putin's attack dog” or “Putin’s soldier” for his loyalty to Russian President Vladimir Putin, has visited Moscow Central Clinical Hospital regularly through the years to undergo procedures. He was allegedly diag
a day ago
Yahoo Sport Australia
Reece Walsh 'weakness' called out as Broncos move to address ugly truth about NRL star
Reece Walsh is prone to an error or 53. Read more here.
a day ago
HuffPost
Lara Trump Alarms Critics With 'Frightening' Comment About RNC's Election Plans
"Sounds like a perfect authoritarian election plan to me," fascism expert Ruth Ben-Ghiat commented.
13 hours ago
The Independent
Bodies found with ‘hands tied’ in mass graves in Gaza
More than 300 bodies were discovered at two mass graves sites outside Nasser and al-Shifa hospitals
a day ago
Yahoo Finance AU
$500 cost of living payment coming for thousands of Aussie households
Eligible homes can get this lump payment as well as a $180 rebate to go towards their electricity bill.
an hour ago
BANG Showbiz
Kim Kardashian’s nipple bra was ‘moulded after’ her breasts
Kim Kardashian has revealed the Skims Ultimate Nipple Bra is "moulded after" her own breasts.
23 hours ago
Yahoo Lifestyle
The 'new rules' William and Kate's kids are following in the wake of her cancer diagnosis
Prince William and Kate Middleton's kids are now following several 'new rules' in the wake of her cancer diagnosis. Read more.
2 days ago
The Daily Beast
New Complaint Alleges Trump Campaign Hid Millions in Lawyer Payments
John Taggart/Pool via ReutersOn Wednesday morning, The Daily Beast published a report detailing how Donald Trump’s presidential campaign and four associated PACs have been using a GOP compliance firm to pay legal fees, obscuring who is the ultimate recipient of millions of dollars in legal payments.By Wednesday night, the Trump campaign and the four PACs were facing a new ethics complaint over the arrangement.The complaint, which nonprofit watchdog Campaign Legal Center filed with the Federal El
6 hours ago
Evening Standard
Ukraine attacked Russia with long-range ballistic missiles secretly supplied by US
American officials say weapons used to bomb military airfield in Crimea
10 hours ago
Yahoo Lifestyle
Shopper shocked by Kmart's major mistake: 'They’re useless’
A Kmart shopper has taken to social media to share their frustrations with the budget retailer. Read more.
2 days ago
Evening Standard
Ukrainian forces 'regain lost positions' in battle for Chasiv Yar against Putin's army
The news from the frontline came as Congress passed a £48 billion military aid package for Kyiv
22 hours ago
The Independent
London horses – live: Three soldiers among those injured as blood-soaked cavalry horses rampage through city
The animals, wearing saddles and bridles, were seen running in the road near Aldwych on Wednesday morning
13 hours ago
CNN
Fact check: Trump falsely claims police turned away ‘thousands’ from Manhattan courthouse and that supporters ‘can’t get near’
Former President Donald Trump is a famed exaggerator about the size of his crowds. For years, he has lied about how many supporters attended his presidential inauguration and numerous campaign rallies.
8 hours ago
InStyle
Kim Kardashian Says Her Controversial Nipple Bra Was "Molded After" Her Own Breasts
And teased that a "half nip" version might be hitting shelves soon.
a day ago

Latest stories