Culture

Spotify is investigating after hackers allegedly accessed millions of tracks

The Spotify music library scrape claimed by Anna’s Archive has triggered a new round of concerns about piracy and copyright enforcement in streaming. The group says it obtained 256 million rows of track metadata and accessed around 86 million audio files, a cache it describes as a “preservation archive” intended for distribution via torrents. Spotify says it has identified and disabled accounts involved in the unlawful scraping and is investigating what was accessed.

What Spotify has confirmed so far

Spotify says a third party scraped public metadata and used illicit tactics to circumvent DRM to reach some audio files. The company says it has disabled the “nefarious user accounts” tied to the activity, added safeguards against similar “anti-copyright attacks”, and is monitoring for suspicious behaviour.

Spotify has also signalled that the incident does not amount to a full copy of its catalogue. The platform’s library is larger than the figures cited by Anna’s Archive, and Spotify has framed the event as unauthorised access rather than a compromise of user accounts.

What is known about the released data

Reports circulating since Sunday, 21 December 2025, suggest that the first material made publicly available relates to metadata, not full music files. The group’s claim that audio files will follow has not been independently verified, and Spotify has said it does not believe the music itself has been released yet.

The scale matters even if the claims are partially overstated. A dataset described as hundreds of terabytes would be among the largest single collections of mainstream recorded music ever assembled for redistribution, with an obvious potential to be mirrored across multiple networks.

Image: Spotify

How the Spotify music library scrape was described

Anna’s Archive is primarily known for indexing and linking to pirated books and research papers, but it has presented the Spotify project as an extension of its preservation mission. In the material reported by media outlets, the group says it used Spotify’s popularity metrics to prioritise downloads, arguing this would capture the music that is actually listened to most.

Because Spotify streams DRM-protected audio, the most consequential part of the allegation is not metadata scraping, but the claim that audio files were accessed at scale. Spotify’s statement indicates that DRM circumvention is part of what it is investigating.

Why rights-holders and AI companies are watching

For labels and publishers, the immediate risk is straightforward: a torrent-based release could enable large-scale redistribution of copyrighted recordings outside licensing frameworks.

A second concern is the downstream use of such a collection as training data for AI systems. Even without publishing full tracks, rich metadata (and any audio that leaks) can help replicate catalogue structure, map listening patterns, or support model training in ways that rights-holders may challenge.

How the Stockholm-based platform fits into EU copyright enforcement

Spotify is headquartered in Stockholm, and the alleged scrape lands at a moment when Europe is tightening its approach to digital rights and transparency. EU rules on copyright in the Digital Single Market, alongside newer requirements linked to general-purpose AI compliance, are increasingly shaping how companies document and defend the lawful use of creative works.

For Nordic governments and creative industries, the episode is likely to feed into a familiar policy debate: how to balance broad access to culture with enforceable protections for creators—especially when distribution and archiving technologies make large collections easy to replicate.

What happens next

Spotify’s investigation and any follow-on enforcement will determine whether the claims translate into a material piracy wave, or remain a metadata-heavy incident with limited practical impact. Either way, the episode underlines a structural issue for streaming platforms: the same scale that makes global catalogues useful also makes them attractive targets for automated extraction—and difficult to contain once copied.

Shares:

Related Posts