What is link rot and how does it threaten the web?

If you’re browsing the web and find an unexpected 404 or redirect error page, you’ve seen link rot running. Over time, the bonds that hold the web together are breaking, threatening our common cultural history. Here’s why this happens.

What is link rot?

Link rot is when links on a website get broken over time, creating a broken or broken link. By “broken link” we mean a link that no longer refers to the intended target at the time the link was created. When you click on one of these broken links, you either get a 404 error or you see the wrong page or website.

Link rot is common. A 2021 Harvard study looked at the hyperlinks of more than 550,000 New York Times articles between 1996 and 2019 and found that 25% of links to certain pages were inaccessible, with the rate of deterioration increasing dramatically depending on the link. (For example, about 6% of links from 2018 are dead compared to 72% from 1998). Another study found that out of a pool of 360 links collected in 1995, only 1.6% were still working in 2016.

Why does link rot occur?

The web is a streamlined, decentralized broker with no central control. Therefore, the content may become unavailable at any time and without notice. Servers come and go, websites shut down, services move to new hosts, software gets updated, prints move to new CMPs and content isn’t migrated, domains expire, etc.

There is another related issue on the web called “content creep” where the link remains functional but the link content has changed from the original link, which could cause problems because the original link author intended to indicate different information.

What’s so bad about losing old sites?

It is in the nature of the world that things decay and disappear. Keeping information alive is an active process that takes time, energy, and effort. So the main problem with link rot is not necessarily that we have to store all the information forever, but rather that electronic information and references are likely to become more fragile and vulnerable than the primarily paper references used in the past.

Many authors of press articles, academic papers, and even court decisions use web links as a citation mechanism to provide primary sources of context for the information provided. This problem also arose with Wikipedia. As Jonathan Zittrain explained in a 2021 article on stale relationships for The Atlantic, “Supply is the glue that holds humanity’s knowledge together. This allows you to learn more about what is only briefly stated in an article like this, and for others to double-check the facts as I represent them.” . »

If the links are broken and the sources become unavailable, it will be difficult for the reader to judge whether the author provided the original source of information honestly and accurately. And even beyond links, some websites provide information online that can’t be found anywhere else. The loss of these pages creates gaps in the collective knowledge of humanity and gaps in the fabric of our common culture.

What is the solution to mold hook?

Experts consider link rot and content skew to be endemic to the web as it is currently designed. This means that they are part of the fundamental nature of the web and will not go away unless we actively try to correct or mitigate them.

One of the most effective solutions to the problem of link rot emerged in 1996 with the Internet Archive, which has kept public records of billions of websites over the past 25 years. If you find a broken link, go to the Wayback Machine of the Internet Archive and paste the link into its search bar. If the location is captured, you will be able to browse through the results. If the site was recently removed, it is possible to view the original content from a cached copy stored by Google.

Beyond the Internet Archive, a Harvard-led project called Perma.cc is capturing permanent copies of websites for the purpose of enabling long-term academic and legal citations. The links are maintained by the Library Consortium, so they should remain in effect for some time. The goal is to create links that don’t rot – they should last as long as the Perma.cc archive is kept.

Other potential solutions for linking rot are still in their infancy, including potential Web 3.0 solutions and distributed data storage through protocols such as IPFS. Ironically, hundreds of years from now, the only surviving sites from that era may be those that people printed on paper. Be careful !

Leave a Comment