38% of Webpages That Existed in 2013 Are No Longer Accessible a Decade Later 62
A new Pew Research Center analysis shows just how fleeting online content actually is: 1. A quarter of all webpages that existed at one point between 2013 and 2023 are no longer accessible, as of October 2023. In most cases, this is because an individual page was deleted or removed on an otherwise functional website.
2. For older content, this trend is even starker. Some 38% of webpages that existed in 2013 are not available today, compared with 8% of pages that existed in 2023.
This "digital decay" occurs in many different online spaces. We examined the links that appear on government and news websites, as well as in the "References" section of Wikipedia pages as of spring 2023. This analysis found that:
1. 23% of news webpages contain at least one broken link, as do 21% of webpages from government sites. News sites with a high level of site traffic and those with less are about equally likely to contain broken links. Local-level government webpages (those belonging to city governments) are especially likely to have broken links.
2. 54% of Wikipedia pages contain at least one link in their "References" section that points to a page that no longer exists.[...]
2. For older content, this trend is even starker. Some 38% of webpages that existed in 2013 are not available today, compared with 8% of pages that existed in 2023.
This "digital decay" occurs in many different online spaces. We examined the links that appear on government and news websites, as well as in the "References" section of Wikipedia pages as of spring 2023. This analysis found that:
1. 23% of news webpages contain at least one broken link, as do 21% of webpages from government sites. News sites with a high level of site traffic and those with less are about equally likely to contain broken links. Local-level government webpages (those belonging to city governments) are especially likely to have broken links.
2. 54% of Wikipedia pages contain at least one link in their "References" section that points to a page that no longer exists.[...]
I warned you ... (Score:4, Funny)
for more space on the server.
Re: I warned you ... (Score:2)
Re: (Score:3)
SSL-requirements ensures 100% webpage death (Score:1, Insightful)
Guess what. If Google and Mozilla didn't decide they would depreciate HTTP and force everyone to use HTTPS with expiring certificates, then a lot of these pages would still be reachable.
Since SSL certificates expire, people who don't have the time or money to renew the certificates, or put up with Let's Encrypt coderot, just stop renewing the certificate, and thus it becomes invisible.
Re: (Score:3)
> If Google and Mozilla didn't decide they would depreciate HTTP and force everyone to use HTTPS
What are you talking about? Http pages work just with Firefox and Chrome. Here is one example if you want to try: fine: http://www.columbia.edu/~fdc/s... [columbia.edu]
Re: (Score:2)
I think that means that Slashdot's servers won't serve http requests even if users make them.
Link rot and expiration of SSL certificates probably isn't the whole story.
Microservices (Score:2)
Re: (Score:3, Funny)
Re: (Score:2, Interesting)
Re: (Score:3)
Trust the science,
Most of the people sarcastically saying "trust the science" are actually critical of what's published in the popular press, usually with no science content at all (and then they comb through the least reliable sources to mine quotes to critique out of context.)
except for every single day we're learning that something they said was a lie.
Yes, in science, the idea is that when we get more information, we revise our opinions. Another point about science is that we post things like "error bars" (that almost never get reported in the popular press).
So I'm sure that GMO stuff that they fought tooth and nail to prevent labeling of is fine.
How things are labeled is not science.
That fake meat with a gazillion ingredients is supposedly fine too.
"S
Re: (Score:3, Insightful)
To advance we need to know where we came from. If people are offended that we can look back into history and see the many, many, many wrongs we did in the name of religion or how science showed previously "known" ideas were utterly wrong, they are no better than the Catholic Church suppressing science because it contradicted "the word of god". That does nothing except to retard our growth and keep us ignorant.
Re: (Score:2)
Re: Unerased history (Score:2)
Rage much? But why care? (Score:2)
Re: Rage much? But why care? (Score:4, Funny)
That's why I read the comments first, you just saved me 10 minutes.
Re:Rage much? But why care? (Score:5, Insightful)
Of course people take stuff down from the net. Well DUH!
I'm curious if you feel that way when you cite something in your PhD and it gets rejected under peer review because that citation is no longer able to be followed.
I'm curious if you feel that way when you need the manual to that expensive device you bought, but that particular document was moved from the website.
I'm curious if you feel that way when you're in a legal battle and that page you knew said one thing that would come to your defence was taken down.
The internet is a wealth of information. Some of it won't be missed, others is absolutely fucking invaluable to the point of people actually going to the wayback machine to get help - i.e. actively going out of their way to look up an old page they knew existed.
Everyone who has ever visited the Internet Archive cares. That would also include many Slashdotters given how many times we use the Internet Archive or run stories about their importance here.
Re: (Score:1)
Re: (Score:2)
If you cite something - put it on your own website.
I feel like you don't understand the fundamentals of how citation or copyright works.
The internet doesn't owe you a thing. Websites change - domains change - it is not immutable.
No one said it does or that it is. We're simply talking about the problem. Are you even paying attention to your own conversation?
and this is only on the English-language web (Score:2)
- huge local ISPs, with millions of customers, shut down or got bought, and the web hosting services they were offering bundled with the subscription were simply wiped.
- hugely popular blogs (remember those?), also with visitor counts in the millions, migrated to the new platforms like FB and their old content was simply aband
Provider closed down their hosting (Score:5, Insightful)
I used to have a home page along with some technical advice on how to use a popular networking package. Then I took a new job and ceased to involve that package, my knowledge and my advice became obsolete in a matter of months. Then my ISP announced they were going to cease hosting private web pages in a couple of months.
A couple of months later my home page was gone and I no longer saw any point in reviving it elsewhere.
My ISP was not the only one to close their free hosting down around then and I'm sure a large proportion of those pages were never migrated elsewhere.
Wikipedia's notability policiy makes things worse (Score:3)
Re: (Score:2)
Some encyclopedic event happens. Then the online references go offline, then Wikipedia deletes it as not notable.
Can you cite a single instance where this actually happened? When a web link rots, the Wikipedia citations usually gets changed to the archive.org link.
I will suggest that anything that's so not-notable that there don't exist any references in print, and every single reference on the web is on a webpage that's been deleted, should not have been considered notable in the first place.
...Also many other wikis have shut down in the last few years as well, as vandalism and spam take over as the original owners get bored.
Which is precisely the reason Wikipedia has a notability policy, and the notability requisite is sources outside of "I'm nota
But was anything of value lost? (Score:2)
In every other area of history and nature, refinement comes from destructive pressures weeding out the old--forests are reborn from fires, cities often improve safety and planning when rebuilding after natural disasters, hypothesies are refined by critique and opposing data...
While I'm not opposed to internet archives per se, I think it's a mistake to think that we can accurately identify the 'important' content on the internet without a similar winnowing process. In essence, it's the possibility of loss th
It's only gonna get worse (Score:5, Insightful)
The only things that exist on the internet are funded one way or another. There's no funding? The information is irrevocably lost.
Wikipedia is no exception, it exists only because of generous donations. Various home pages/websites? It's all the same, people behind them pay for DNS and hosting. No payments? It's dead, Jim.
Fortunately we have Web Archive but it's not a panacea either as some webpages go overboard with JavaScript and saved web pages are completely dysfunctional, and some are not archived because no one bothers to submit them. And then there are various social networks that make archiving them near impossible because they are dynamically generated using JS and the same applies to their comments. And not only that, some cannot be browsed without logging in first (FB, Twitter, Instagram and Reddit).
It's quite depressing really but encouraging at the same time. Websites that actually care about their content (preservation) and its usability make sure they are readable without JS and fancy stuff.
Re: (Score:2)
Pages also only exist at the generosity of the editors. Pages get deleted all the time for various reasons, and while if you know how you can probably view the last edit, for the most part it's gone.
Direct experience (Score:3)
Re:Direct experience (Score:5, Informative)
You probably don't need to boot your ancient Windows machine, if you have PATA (or SATA) to USB cables then you simply hook up the disc to that and copy the data. Don't do it under Windows, if the filesystem is NTFS then it may well want to modify the rights to match yours in the Windows machine, but I've done it a few times under Linux.
If your filesystem is VFAT then it may be safe to extract it from a Windows machine, but YMMV on that front.
Re: (Score:2)
Re: (Score:2)
I wrote recently some code and put it in github, just because it will most likely stay there safe more likely than on my hard drive in case I ever need it. I remember that it was really hard to write, because there was no example code anywhere to be found for this particular problem. I thought that I might as well be nice to the next guy who ever might be needing code like this so I put it up.
Data gets lost (Score:2)
It happens.
From burning of ancient libraries to my useless last-year website project.
OEM Technical Websites = Dell, Asus, etc. (Score:4, Informative)
Forget about Wikipedia and published information. Look at the amount of websites and links that disappear in just a matter of years from OEM manufacturers for technical products. Unless you download all of the manuals and drivers and bios files and any other additional support files while you still have your product fresh from purchasing, you will most likely not have access to any of those things in just a few short years .
Additionally, none of the oems are going to upgrade or update any. I'll be there documentation there. It's almost as if designed obsolescence is in full effect for any type of supporting documentation and downloads.
Just recently, I needed to access some CIM / WS-Man files from the schemas website from Dell to get the MOF which are the Managed Object Files, kind of similar to SNMP MIB files. But all of those files that were hosted on the schemas.dell.com are gone because Dell deleted the entire schemas website and everything is gone even after I've spoken to their level 3 technical support multiple times asking to provide the files for their older equipment and chassis that hour still being used in production that we are trying to get data from.
What's worse is that their Wiki pages gone also along with their tech articles websites and so is this schemas website.
Dell has really gone downhill lately and have lost so much information and so many employees have left or been fired or have been outsourced or terminated or laid off that it's impossible to get information older than about 5 years ago.
I had the same problem with Asus looking at some old motherboard information after they redesigned their website a few years ago, a lot of their legacy stuff just disappeared and vanished and is no longer available. Unless you find it from some third party download website that might be a little shady.
The internet in the web is definitely not long-term resilient. We're lucky that we have archive.org and the time machine to allow us to look at some things. But even that is not foolproof because when I went there looking for the old files, none of them were available there because they weren't archived in time or they weren't allowed to be archived or there were some other technical problems.
Schemas.dell.com
Asus Downloads
1984 memory hole (Score:2, Interesting)
With constantly changing online content, it will be impossible for anyone to fact-check propaganda coming out of governments and other spaces in the future. Anything that can be "looked up" will be the current facts. Were we always at war with Russia, or were were always at war with China? Resistance to change will only last a couple generations before the new ones are fully indoctrinated to the latest thing? Or is this already happening? Should be be questing the reasons for past major wars or should
So what? (Score:2)
This has always been the nature of the web. I remember checking out a library book back in college in '03 that had a citation page with multiple internet websites. As I was trying to seek original sources for a research paper, I followed through with the citations, but about a third of them were 404's. And the book was only five years old.
Is it really all that bad? I mean, look at the internet as it is today. Do you really think 100% of it is worth keeping a decade from now?
Now get off my lawn and let
Re: (Score:2)
Even old operating systems are breaking (Score:3)
The upcoming Windows 10 end of support in October 2025 will cause extreme hell for tech support teams, and then just 14 years from now you will have all sorts of shit break from the 2038 bug. AI will have destroyed the internet by then anyway.
Re: (Score:2)
ZOMBO still exists. (Score:1)
Why am I not surprised? (Score:3)
Lots of tech-related reasons speculated here so far. While many of those seem valid, to me, it's much simpler than any of those. From here [lendingtree.com]:
23.2% of private sector businesses in the U.S. fail within the first year. After five years, 48.0% have faltered. After 10 years, 65.3% of businesses have closed.
Imagine that those companies had web sites, Facebook, etc. pages. Stuff like that, things that become obsolete quickly, is going to explain a lot of that churn. And that's just companies.
At least one comment mentioned Wikipedia. What does it mean to say that one of its pages is no longer there if the entire editing history is retained? It's simply moved.
Don't trust URIs (Score:2)
URIs are a very convenient idea that is flawed from the start. They're worse than journal references to a journal that you don't have a subscription to. In that case SOMEBODY can look it up.
So what? (Score:2)
You can't possibly convince me that even 10% of the internet is worth preserving.
Most of it should be treated like stuff overheard at a bar near closing time. Questionable at best, and if it was never heard again, nothing of value would be lost.
What would be nice is if content maintainers policed their outward links. Wouldn't it be great if sites did that automatically as a batch job and struck through the dead ones?
Re: (Score:3)
Which 10% depends on who you ask though.
Re: (Score:2)
I'll bet 8% of the last 10% wouldn't be controversial. And if we lose some wheat with the chaff so be it.
It's gone (Score:2)
Suck.com is no longer a thing.
Re: (Score:2)
That one takes me back. I used to be a daily reader in the late 90s.
Pointless (Score:1)
The reason the web is dying is as obvious as a dead body in a swimming pool.
There are roughly six companies that control whether a web site is visible. The default setting is "not visible." If any person tries to get attention for their web site they get banned and their credit cards are all blacklisted.
Even if a person is willing to pay to get attention for their web site, they end up paying for an unlimited quantity of fake clicks. When they complain they get banned and all their credit cards get blacklis
Why the Loss of Firefox Scrapbook was a Tragedy (Score:2)
I solved that... (Score:3)
The last time I relocated halfway across the country (that's relocation #5, and THE LAST), and since I have seen, in my own life, that a stable email address was IMPORTANT (you don't think so? Then why do you keep your phone number?), and figured move to one county, then maybe another, then maybe another, I gave in, registered my own domain, then started buying hosting (from a *different* provider - that was back when GoDaddy was being obnoxious if you wanted to change hosting providers from them), and my web site is there, at least until I die.
Which turned out to be a Good Thing, since I'm now a published writer.
My blog is still accessible (Score:2)
My old blog that I had started from 2007 is still accessible.
All those Geocities pages are gone (Score:2)
Good riddance.
guilty (Score:2)
I am guilty of this myself. I had some pages hand coded in php by myself (simple stuff, not something very hardcore), server was updated, new php deprecated and then removed functions, page don't work any more. Didn't find time/motivation do redo it, some stuff was moved to a Dokuwiki, some other stuff was abandoned.
For some other pages I used open source php code (particularly a light photo galley), project was abandoned, php changed, code stopped to work. I had to abandon the pages too.
But, honestly, much
Re: (Score:2)
Am I Still on Slashdot? (Score:2)
I'm 30 comments in and I haven't seen a reference to Goatse yet. I haven't checked, but I'm guessing the original site is no longer the Highly Important and Relevant Information [archive.org] it once was...