Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
The Internet IT

38% of Webpages That Existed in 2013 Are No Longer Accessible a Decade Later 62

A new Pew Research Center analysis shows just how fleeting online content actually is: 1. A quarter of all webpages that existed at one point between 2013 and 2023 are no longer accessible, as of October 2023. In most cases, this is because an individual page was deleted or removed on an otherwise functional website.
2. For older content, this trend is even starker. Some 38% of webpages that existed in 2013 are not available today, compared with 8% of pages that existed in 2023.

This "digital decay" occurs in many different online spaces. We examined the links that appear on government and news websites, as well as in the "References" section of Wikipedia pages as of spring 2023. This analysis found that:
1. 23% of news webpages contain at least one broken link, as do 21% of webpages from government sites. News sites with a high level of site traffic and those with less are about equally likely to contain broken links. Local-level government webpages (those belonging to city governments) are especially likely to have broken links.
2. 54% of Wikipedia pages contain at least one link in their "References" section that points to a page that no longer exists.[...]
This discussion has been archived. No new comments can be posted.

38% of Webpages That Existed in 2013 Are No Longer Accessible a Decade Later

Comments Filter:
  • by PPH ( 736903 ) on Monday May 20, 2024 @09:57AM (#64485051)

    ... not to ask the BOFH [bjash.com]

    for more space on the server.

    • Ok that is impressive that this still exists. My google picasa, MySpace, geocities and so many other sites have gone AWOL over the years - but BOFH still persists
    • Guess what. If Google and Mozilla didn't decide they would depreciate HTTP and force everyone to use HTTPS with expiring certificates, then a lot of these pages would still be reachable.

      Since SSL certificates expire, people who don't have the time or money to renew the certificates, or put up with Let's Encrypt coderot, just stop renewing the certificate, and thus it becomes invisible.

      • by dvice ( 6309704 )

        > If Google and Mozilla didn't decide they would depreciate HTTP and force everyone to use HTTPS

        What are you talking about? Http pages work just with Firefox and Chrome. Here is one example if you want to try: fine: http://www.columbia.edu/~fdc/s... [columbia.edu]

        • I just told lynx to request the http version of this page. Slaqshdot's servers issued an illegal redirection after which lynx commented "Start file could not be found or is not text/html or text/plain"

          I think that means that Slashdot's servers won't serve http requests even if users make them.

          Link rot and expiration of SSL certificates probably isn't the whole story.

  • Wait until everyone pushing "serverless" (misnomer, BTW), finds out all their third party microservices can disappear pretty quickly also. Web pages might still be there, but non-functional because all of their elements are gone.
    • Re: (Score:3, Funny)

      by Anonymous Coward
      Don't worry, AI will train itself off the archive.org content and become all knowing and all powerful. Wait until it watches all the Terminator movie series.
  • That's 10mins of my life I won't get back. It's like the story and research are trying to be outraged about something that it can't quite figure out who or why they are mad at? Of course people take stuff down from the net. Well DUH!
    • by Shane A Leslie ( 923938 ) on Monday May 20, 2024 @10:15AM (#64485109) Homepage Journal

      That's why I read the comments first, you just saved me 10 minutes.

    • by thegarbz ( 1787294 ) on Monday May 20, 2024 @11:53AM (#64485461)

      Of course people take stuff down from the net. Well DUH!

      I'm curious if you feel that way when you cite something in your PhD and it gets rejected under peer review because that citation is no longer able to be followed.
      I'm curious if you feel that way when you need the manual to that expensive device you bought, but that particular document was moved from the website.
      I'm curious if you feel that way when you're in a legal battle and that page you knew said one thing that would come to your defence was taken down.

      The internet is a wealth of information. Some of it won't be missed, others is absolutely fucking invaluable to the point of people actually going to the wayback machine to get help - i.e. actively going out of their way to look up an old page they knew existed.

      Everyone who has ever visited the Internet Archive cares. That would also include many Slashdotters given how many times we use the Internet Archive or run stories about their importance here.

      • It's not your content - nor your links. If you cite something - put it on your own website. The internet doesn't owe you a thing. Websites change - domains change - it is not immutable.
        • If you cite something - put it on your own website.

          I feel like you don't understand the fundamentals of how citation or copyright works.

          The internet doesn't owe you a thing. Websites change - domains change - it is not immutable.

          No one said it does or that it is. We're simply talking about the problem. Are you even paying attention to your own conversation?

  • It is even more depressing in other languages. I cannot produce a formal study on this, so just anecdotal: all my bookmarks from 20 years ago are gone. A few possible explanations:
    - huge local ISPs, with millions of customers, shut down or got bought, and the web hosting services they were offering bundled with the subscription were simply wiped.
    - hugely popular blogs (remember those?), also with visitor counts in the millions, migrated to the new platforms like FB and their old content was simply aband
  • by Vlad_the_Inhaler ( 32958 ) on Monday May 20, 2024 @10:15AM (#64485107)

    I used to have a home page along with some technical advice on how to use a popular networking package. Then I took a new job and ceased to involve that package, my knowledge and my advice became obsolete in a matter of months. Then my ISP announced they were going to cease hosting private web pages in a couple of months.
    A couple of months later my home page was gone and I no longer saw any point in reviving it elsewhere.
    My ISP was not the only one to close their free hosting down around then and I'm sure a large proportion of those pages were never migrated elsewhere.

  • Some encyclopedic event happens. Then the online references go offline, then Wikipedia deletes it as not notable. With copyright extremists taking down content from the Wayback machine as well, some things just don't exist anymore. Also many other wikis have shut down in the last few years as well, as vandalism and spam take over as the original owners get bored. Many of the wikis I used to contribute to no longer exist. The 95 year copyright law is to blame for a lot of stuff that go on the internet. Thanks to one stupid little mouse, a lot of our history got Library of Alexandra'd. Also terms of services being applied retroactively meaning previously acceptable stuff disappears from the Overton window and their hosts get taken down.
    • by XXongo ( 3986865 )

      Some encyclopedic event happens. Then the online references go offline, then Wikipedia deletes it as not notable.

      Can you cite a single instance where this actually happened? When a web link rots, the Wikipedia citations usually gets changed to the archive.org link.

      I will suggest that anything that's so not-notable that there don't exist any references in print, and every single reference on the web is on a webpage that's been deleted, should not have been considered notable in the first place.

      ...Also many other wikis have shut down in the last few years as well, as vandalism and spam take over as the original owners get bored.

      Which is precisely the reason Wikipedia has a notability policy, and the notability requisite is sources outside of "I'm nota

  • In every other area of history and nature, refinement comes from destructive pressures weeding out the old--forests are reborn from fires, cities often improve safety and planning when rebuilding after natural disasters, hypothesies are refined by critique and opposing data...

    While I'm not opposed to internet archives per se, I think it's a mistake to think that we can accurately identify the 'important' content on the internet without a similar winnowing process. In essence, it's the possibility of loss th

  • by Artem S. Tashkinov ( 764309 ) on Monday May 20, 2024 @10:25AM (#64485155) Homepage

    The only things that exist on the internet are funded one way or another. There's no funding? The information is irrevocably lost.

    Wikipedia is no exception, it exists only because of generous donations. Various home pages/websites? It's all the same, people behind them pay for DNS and hosting. No payments? It's dead, Jim.

    Fortunately we have Web Archive but it's not a panacea either as some webpages go overboard with JavaScript and saved web pages are completely dysfunctional, and some are not archived because no one bothers to submit them. And then there are various social networks that make archiving them near impossible because they are dynamically generated using JS and the same applies to their comments. And not only that, some cannot be browsed without logging in first (FB, Twitter, Instagram and Reddit).

    It's quite depressing really but encouraging at the same time. Websites that actually care about their content (preservation) and its usability make sure they are readable without JS and fancy stuff.

    • by tlhIngan ( 30335 )

      Wikipedia is no exception, it exists only because of generous donations

      Pages also only exist at the generosity of the editors. Pages get deleted all the time for various reasons, and while if you know how you can probably view the last edit, for the most part it's gone.

  • by dsgrntlxmply ( 610492 ) on Monday May 20, 2024 @10:33AM (#64485185)
    In 2013 I wrote a paper on an extended hobby project that relates to an arcane topic in industrial history and collector interest, put it up on FTP, and posted a reference in a forum relevant to that topic. My original copy of the paper is on an ancient Windows machine that probably will not even boot now. I am not certain whether my ISP FTP access is still active, and FTP was removed from Chrome and Firefox. I searched and was surprised to find my paper in references from an article at Wikipedia France, where someone had scraped a copy and provided an archive link: saved by some invisible hand.
    • by Vlad_the_Inhaler ( 32958 ) on Monday May 20, 2024 @10:48AM (#64485227)

      You probably don't need to boot your ancient Windows machine, if you have PATA (or SATA) to USB cables then you simply hook up the disc to that and copy the data. Don't do it under Windows, if the filesystem is NTFS then it may well want to modify the rights to match yours in the Windows machine, but I've done it a few times under Linux.
      If your filesystem is VFAT then it may be safe to extract it from a Windows machine, but YMMV on that front.

    • by dvice ( 6309704 )

      I wrote recently some code and put it in github, just because it will most likely stay there safe more likely than on my hard drive in case I ever need it. I remember that it was really hard to write, because there was no example code anywhere to be found for this particular problem. I thought that I might as well be nice to the next guy who ever might be needing code like this so I put it up.

  • It happens.
    From burning of ancient libraries to my useless last-year website project.

  • by JakFrost ( 139885 ) on Monday May 20, 2024 @10:50AM (#64485237)

    Forget about Wikipedia and published information. Look at the amount of websites and links that disappear in just a matter of years from OEM manufacturers for technical products. Unless you download all of the manuals and drivers and bios files and any other additional support files while you still have your product fresh from purchasing, you will most likely not have access to any of those things in just a few short years .

    Additionally, none of the oems are going to upgrade or update any. I'll be there documentation there. It's almost as if designed obsolescence is in full effect for any type of supporting documentation and downloads.

    Just recently, I needed to access some CIM / WS-Man files from the schemas website from Dell to get the MOF which are the Managed Object Files, kind of similar to SNMP MIB files. But all of those files that were hosted on the schemas.dell.com are gone because Dell deleted the entire schemas website and everything is gone even after I've spoken to their level 3 technical support multiple times asking to provide the files for their older equipment and chassis that hour still being used in production that we are trying to get data from.

    What's worse is that their Wiki pages gone also along with their tech articles websites and so is this schemas website.

    Dell has really gone downhill lately and have lost so much information and so many employees have left or been fired or have been outsourced or terminated or laid off that it's impossible to get information older than about 5 years ago.

    I had the same problem with Asus looking at some old motherboard information after they redesigned their website a few years ago, a lot of their legacy stuff just disappeared and vanished and is no longer available. Unless you find it from some third party download website that might be a little shady.

    The internet in the web is definitely not long-term resilient. We're lucky that we have archive.org and the time machine to allow us to look at some things. But even that is not foolproof because when I went there looking for the old files, none of them were available there because they weren't archived in time or they weren't allowed to be archived or there were some other technical problems.

    Schemas.dell.com

    Asus Downloads

  • With constantly changing online content, it will be impossible for anyone to fact-check propaganda coming out of governments and other spaces in the future. Anything that can be "looked up" will be the current facts. Were we always at war with Russia, or were were always at war with China? Resistance to change will only last a couple generations before the new ones are fully indoctrinated to the latest thing? Or is this already happening? Should be be questing the reasons for past major wars or should

  • This has always been the nature of the web. I remember checking out a library book back in college in '03 that had a citation page with multiple internet websites. As I was trying to seek original sources for a research paper, I followed through with the citations, but about a third of them were 404's. And the book was only five years old.

    Is it really all that bad? I mean, look at the internet as it is today. Do you really think 100% of it is worth keeping a decade from now?

    Now get off my lawn and let

    • by XXongo ( 3986865 )
      Yeah, I'm not surprised that 38% of web pages that existed in 2023 are dead. I'm surprised that only 38% of web pages that existed in 2023 are dead.
  • by xack ( 5304745 ) on Monday May 20, 2024 @11:40AM (#64485419)
    Many old operating systems are unusable online not because of a hardware failure, but because of hardcoded security certificates or protocols that have expired. Million of Android phones will be forced offline on June 6th 2024. It was originally going to happen in September 2021, but was saved due to a security flaw in how Android processes certificates. And Windows XP has tons of outdated certificates, meaning that accessing HTTPs sites are impossible now, You're basically stuck with frogfind and old insecure servers that never updated to https, if they haven't been taken down by a CVE that replaces them with malware.

    The upcoming Windows 10 end of support in October 2025 will cause extreme hell for tech support teams, and then just 14 years from now you will have all sorts of shit break from the 2038 bug. AI will have destroyed the internet by then anyway.
    • Under Windows, you can just do Win+R on the keyboard to open the run dialog, type in mmc and hit enter, then File->Add / Remove Snap-in, scroll down until you see Certificates and double click it (on Windows XP you'll need to click the Add button first), select Computer Account, click next then finish then OK / Close until you reach the previous screen called Console Root, Expand the new Certificates entry on the tree, then expand Trusted Root Certification Authorities, then right click on Certificates a
  • by Flexagon ( 740643 ) on Monday May 20, 2024 @12:14PM (#64485517)

    Lots of tech-related reasons speculated here so far. While many of those seem valid, to me, it's much simpler than any of those. From here [lendingtree.com]:

    23.2% of private sector businesses in the U.S. fail within the first year. After five years, 48.0% have faltered. After 10 years, 65.3% of businesses have closed.

    Imagine that those companies had web sites, Facebook, etc. pages. Stuff like that, things that become obsolete quickly, is going to explain a lot of that churn. And that's just companies.

    At least one comment mentioned Wikipedia. What does it mean to say that one of its pages is no longer there if the entire editing history is retained? It's simply moved.

  • URIs are a very convenient idea that is flawed from the start. They're worse than journal references to a journal that you don't have a subscription to. In that case SOMEBODY can look it up.

  • You can't possibly convince me that even 10% of the internet is worth preserving.

    Most of it should be treated like stuff overheard at a bar near closing time. Questionable at best, and if it was never heard again, nothing of value would be lost.

    What would be nice is if content maintainers policed their outward links. Wouldn't it be great if sites did that automatically as a batch job and struck through the dead ones?

  • Suck.com is no longer a thing.

  • The reason the web is dying is as obvious as a dead body in a swimming pool.

    There are roughly six companies that control whether a web site is visible. The default setting is "not visible." If any person tries to get attention for their web site they get banned and their credit cards are all blacklisted.

    Even if a person is willing to pay to get attention for their web site, they end up paying for an unlimited quantity of fake clicks. When they complain they get banned and all their credit cards get blacklis

  • Firefox made decisions that killed the most important browser extension ever made: Scrapbook. It was an important research tool. No, pocket is no comparison. If the people who make Firefox really cared, they would include Scrapbook in Firefox. If they cared more about users they would give the users options to: use only memory caches to prevent SSD wear, and also an option to let videos run in the background--even of Youtube doesn't like it. No, hacking the setting is not the same as offering people the opt
  • The last time I relocated halfway across the country (that's relocation #5, and THE LAST), and since I have seen, in my own life, that a stable email address was IMPORTANT (you don't think so? Then why do you keep your phone number?), and figured move to one county, then maybe another, then maybe another, I gave in, registered my own domain, then started buying hosting (from a *different* provider - that was back when GoDaddy was being obnoxious if you wanted to change hosting providers from them), and my web site is there, at least until I die.

    Which turned out to be a Good Thing, since I'm now a published writer.

  • My old blog that I had started from 2007 is still accessible.

  • I am guilty of this myself. I had some pages hand coded in php by myself (simple stuff, not something very hardcore), server was updated, new php deprecated and then removed functions, page don't work any more. Didn't find time/motivation do redo it, some stuff was moved to a Dokuwiki, some other stuff was abandoned.

    For some other pages I used open source php code (particularly a light photo galley), project was abandoned, php changed, code stopped to work. I had to abandon the pages too.

    But, honestly, much

    • Bingo! I've had a bunch of content on the web in the last 25 years, some pubic, other behind some user level access control, I took 99% of it off the web because it costs money and time to maintain and host (or pay someone else to host).
  • I'm 30 comments in and I haven't seen a reference to Goatse yet. I haven't checked, but I'm guessing the original site is no longer the Highly Important and Relevant Information [archive.org] it once was...

As you will see, I told them, in no uncertain terms, to see Figure one. -- Dave "First Strike" Pare

Working...