Looking for an online article from 2013? It may have disappeared, new study says

Looking for an online article from 2013? It may have disappeared, new study says

If you’re looking for an online article from 2013, there’s a chance it may have disappeared.

New research from the Pew Research Centre in the US concludes that 38 per cent of all web pages that existed in 2013 are no longer accessible due to a phenomenon they call “digital decay”.

When researchers extended the time frame, they found that roughly a quarter of all web pages created from 2013 to 2023 no longer exist. Of that number, 8 per cent of those webpages were created in 2023.

“The internet is an unimaginably vast repository of modern life… but even as users across the world rely on the web to access books, images, news articles and other resources, this content sometimes disappears from view,” the study reads.

The research defined inaccessible links as a page that no longer exists on a host server (showing a 404 not found message).

Researchers collected a random sample of web pages from Common Crawl, an internet archive search that takes a snapshot of what the internet looks like at any given time.

The team sampled roughly 90,000 internet pages per year from 2013 to 2023 to see if they still exist.

They found on Wikipedia that some 54 per cent of pages analysed had at least one broken link in the references section.

Around 23 per cent of news sites contained a broken link while 21 per cent of government pages did.

The researchers took a closer look at government sites and found the average webpage had 50 links on it, often to secure HTTP pages for more information.

City governments were the most likely out of the four levels of government to have broken links, with 29 per cent of their sites examined having at least one broken link.

For the news industry, they found roughly the same amount of pages with broken links on sites with high and low traffic and, like government sites, most linked to secure HTTP external websites.

Decay is also happening on social media, where just under one in five posts on X (formerly Twitter) collected in a real-time random sample of 4.8 million posts were not available for more than a few months on the site, either because a user’s account was deleted or the individual post was removed.

A post on X was more likely to disappear if it was written in Turkish or Arabic or came from an account that had “default settings” like a generic profile picture or bio.