How long does a web page live: what the research says about link rot

info link rot

How long does a web page live: what the research says about link rot

Published: 2026-06-26

Open any article from a decade ago and click through the links in it. There is a good chance that some of them lead nowhere. Instead of the page you wanted, you get a 404 error, a parked domain advertising cheap insurance, or a redirect to someone else's site. This phenomenon is called link rot, and it is far more widespread than people tend to assume.

The web was never designed to be permanent. Hosting bills go unpaid, companies shut down, enthusiasts lose interest in their projects, and servers get decommissioned. Each of these ordinary events erases something that may have been genuinely irreplaceable. To understand how quickly this happens, researchers have spent decades measuring the rate at which links decay. Their findings are worth putting side by side.

40 to 100 days

Brewster Kahle, the founder of the Internet Archive, likes to cite numbers from the early days of the web. By his estimate, the average lifespan of a web page is somewhere between 40 and 100 days. It sounds almost unbelievable, but this is about the average page, not large stable sites. A huge mass of online content is created and disappears within a couple of months, and that is the normal way the network works, not a malfunction.

38% over ten years

In 2024 the Pew Research Center published a study titled "When Online Content Disappears". The conclusion was put plainly: 38% of web pages that existed in 2013 were no longer accessible a decade later. Moreover, about a quarter of all pages that existed at some point between 2013 and 2023 had stopped opening by the time they were checked.

To arrive at these figures, the researchers collected a random sample of nearly a million pages and checked whether they still open today. An important detail: link rot grows over time, but it hits even very fresh pages. Among pages just a year old, around 8% were already inaccessible. So the problem does not begin after a decade, it begins literally in the first year of a link's life.

Pew also looked at places where the reliability of links is critical. At least one broken link was found in 54% of Wikipedia articles in their references sections, on 23% of news pages, and on 21% of government sites.

25% of deep links at the New York Times

In 2021 Jonathan Zittrain published an article in The Atlantic with a telling title, "The Internet Is Rotting". His team analyzed about two million external links from New York Times articles. The result: 25% of deep links pointing to specific pages no longer worked. And among the oldest links, from material published in 1998, 72% were dead.

This is a particularly telling example. We are talking about one of the most authoritative and well funded publications in the world, one that links not to random forums but to sources its editors considered worth citing. And even here the link fabric is unraveling before our eyes.

66.5% of links over nine years

The SEO company Ahrefs approached the question from its own angle. Its 2024 study claims that at least 66.5% of links pointing to sites over the past nine years are dead. Counting temporary errors and other issues, the overall share of links "lost" for ranking purposes in their sample reached as high as 74.5%.

The reasons here are a little more varied than a page simply vanishing. A link may have been removed during a content update, swapped for a different one, or taken down due to company policy. Sometimes a competitor simply decides to stop linking to you. But for the user the outcome is the same: the link leads nowhere.

65% over a quarter century

Perhaps the most comprehensive study to date was carried out at Old Dominion University. The work, with the fitting title "Some URLs Are Immortal, Most Are Ephemeral", analyzed 27.3 million addresses from the Wayback Machine index, spanning more than two and a half decades. The conclusion: about 65% of the addresses in the 1996 to 2021 sample were found dead when checked in 2023.

A notable share of these addresses did not even resolve in DNS, meaning the corresponding domains are simply no longer registered. The researchers noted a pattern that recurs in almost every study on this topic: most pages die quickly, within the first few years of their existence. But the few that survive that early period can live for a very long time.

Why the numbers vary so much

An attentive reader will notice that the estimates jump from 25% to 75%, and that is a fair question. The reason is that different studies measure different things. One takes a random sample of all pages, another only external links from a specific publication, another links of a certain age. In one case a "dead" page is one returning an HTTP error, in another DNS failures are counted too. Comparing them head to head is difficult.

But on the main point all these studies agree. The web is fragile, and over time more and more resources die. The spread in the percentages does not change the overall direction, it only shows that the scale of the problem depends on the angle from which you look at it.

What to do about it

This is where the role of web archives comes to the forefront. The same analysis by the Internet Archive team shows that the Wayback Machine rescues a notable share of the dead web. In the Pew sample roughly one link in four would be considered inaccessible, but if you use the archive to reach dead addresses, the share of permanently lost links drops to about one in ten. By their count, around 38% of those very same 38% of dead links from 2013 can be recovered through the archive.

Several practical habits follow from this, useful to anyone who works with information online.

Link to stable sources. Official sites, long lived projects, and archives are far more likely to outlast a random blog.

Check your links regularly. There are tools that scan a site and find broken links, and it makes sense to repeat that check over time.

Use redirects. If you change a URL on your own site, set up a 301 redirect so you do not leave dead links for everyone who pointed to you.

Archive what matters in advance. If a page disappears, an archived copy may be the only way left to reach it. The principle the Internet Archive promotes is short: if you see something, save something.

A web page lives a short life on average, and there is nothing catastrophic in that by itself. The problem arises when something we assumed was permanent disappears: a link in a research paper, a source in a news story, a document someone relied on in court. Link rot is like a library fire that burns very slowly, one broken address at a time. And precisely because it is slow, it is easy to ignore until it is too late.

The use of article materials is allowed only if the link to the source is posted: https://archivarix.com/en/blog/link-rot/

How long does a web page live: what the research says about link rot

23 hours ago

Archivarix Echo: check 200+ web archives with one search

The web keeps falling apart. Pages go offline, accounts get deleted, papers slip behind paywalls, projects shut down. Usually a copy survives somewhere, in the Wayback Machine, archive.today, Common C…

3 days ago

AI Video Summaries in Archivarix Tube Search

When you find a deleted YouTube video through Tube Search, you typically get metadata: a title, description, upload date, and sometimes subtitles. That is already useful. But reading through raw subti…

2 months ago

Archivarix Tube Search - A Search Engine for Deleted YouTube Videos

Tube Search is a search engine for archived YouTube data. The service aggregates information from multiple public sources: the Wayback Machine (Internet Archive), Common Crawl, and various collected Y…

3 months ago

Archivarix Broken Links Recovery: Free WordPress Plugin for Finding and Fixing Broken Links

Over time, external links in WordPress posts inevitably break, pages get deleted, domains expire, videos become unavailable. Checking hundreds or thousands of links manually is impractical. Archivarix…

3 months ago

How the Internet Archive Decides What to Archive: Priorities, Frequency, and Data Sources

One trillion saved pages. Over 99 petabytes of data. Hundreds of crawls running simultaneously every day. Behind these numbers lies a question that everyone who professionally works with web archives …

3 months ago

How to Find and Buy an Expired Domain with a Good History

Buying an expired domain with history is one of the most effective ways to launch a new project with an already existing backlink profile, trust, and even traffic. Instead of promoting a bare domain f…

4 months ago

Common Crawl as an Alternative Data Source for Website Restoration

When it comes to restoring websites from archives, almost everyone thinks only of the Wayback Machine. That's understandable: archive.org is well known, it has a convenient interface, a trillion saved…

4 months ago

Archivarix Cache Viewer Extension for Chrome, Edge and Firefox

We've released a browser extension called Archivarix Cache Viewer. It's available for Chrome, Edge and Firefox. The extension is free and contains no ads whatsoever.
The idea is simple: quick access …

4 months ago

AI Content on Restored Websites: How to Detect It and What to Do About It

When you restore a website from the Web Archive, you expect to get original content that was once written by real people. But if the site's archives were made after 2023, there's a real chance of enco…

4 months ago