Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Security The Internet

Internet Archive Services Resume as They Promise Stronger, More Secure Return (msn.com) 16

"The Wayback Machine, Archive-It, scanning, and national library crawls have resumed," announced the Internet Archive Thursday, "as well as email, blog, helpdesk, and social media communications. Our team is working around the clock across time zones to bring other services back online."

Founder Brewster Kahle told The Washington Post it's the first time in its almost 30-year history that it's been down more than a few hours. But their article says the Archive is "fighting back." Kahle and his team see the mission of the Internet Archive as a noble one — to build a "library of everything" and ensure records are kept in an online environment where websites change and disappear by the day. "We're all dreamers," said Chris Freeland, the Internet Archive's director of library services. "We believe in the mission of the Internet Archive, and we believe in the promise of the internet." But the site has, at times, courted controversy. The Internet Archive faces lawsuits from book publishers and music labels brought in 2020 and 2023 for digitizing copyrighted books and music, which the organization has argued should be permissible for noncommercial, archival purposes. Kahle said the hundreds of millions of dollars in penalties from the lawsuits could sink the Internet Archive.

Those lawsuits are ongoing. Now, the Internet Archive has also had to turn its attention to fending off cyberattacks. In May, the Internet Archive was hit with a distributed denial-of-service (DDoS) attack, a fairly common type of internet warfare that involves flooding a target site with fake traffic. The archive experienced intermittent outages as a result. Kahle said it was the first time the site had been targeted in its history... [After another attack October 9th], Kahle and his team have spent the week since racing to identify and fix the vulnerabilities that left the Internet Archive open to attack. The organization has "industry standard" security systems, Kahle said, but he added that, until this year, the group had largely stayed out of the crosshairs of cybercriminals. Kahle said he'd opted not to prioritize additional investments in cybersecurity out of the Internet Archive's limited budget of around $20 million to $30 million a year...

[N]o one has reliably claimed the defacement and data breach that forced the Internet Archive to sequester itself, said [cybersecurity researcher] Scott Helmef. He added that the hackers' decision to alert the Internet Archive of their intrusion and send the stolen data to Have I Been Pwned, the monitoring service, could imply they didn't have further intentions with it.... Helme said the episode demonstrates the vulnerability of nonprofit services like the Internet Archive — and of the larger ecosystem of information online that depends on them. "Perhaps they'll find some more funding now that all of these headlines have happened," Helme said. "And people suddenly realize how bad it would be if they were gone."

"Our priority is ensuring the Internet Archive comes online stronger and more secure," the archive said in Thursday's statement. And they noted other recent-past instances of other libraries also being attacked online: As a library community, we are seeing other cyber attacks — for instance the British Library, Seattle Public Library, Toronto Public Library, and now Calgary Public Library. We hope these attacks are not indicative of a trend."

For the latest updates, please check this blog and our official social media accounts: X/Twitter, Bluesky and Mastodon.

Thank you for your patience and ongoing support.

This discussion has been archived. No new comments can be posted.

Internet Archive Services Resume as They Promise Stronger, More Secure Return

Comments Filter:
  • "limited budget of around $20 million to $30 million a year."

    Is 20-30 million dollars "a limited budget" for running a site like IA!?!?!? They probably need significant storage? Or do they - text doesn't take that much space? But they only need to offer "bearable" performance. Is it really that expensive to run a site under such condition?

    • Re:Limited budget? (Score:5, Interesting)

      by AmiMoJo ( 196126 ) on Saturday October 19, 2024 @11:16AM (#64877445) Homepage Journal

      Last time I checked they had over a petabyte of data to archive.

      I think the real issue they have is that they won't accept outside help or leverage the open source community. As such their whole site is built out of string and duct tape and some early 2000s era ideas. It's extremely slow outside of the US, large uploads often fail, and very basic stuff that could really help them like BitTorrent support is badly broken.

      They really don't help themselves, and for all the talk of being better than ever the Wayback Machine is still dog slow.

      • Re:Limited budget? (Score:5, Interesting)

        by backslashdot ( 95548 ) on Saturday October 19, 2024 @12:24PM (#64877621)

        A petabyte array of hard disks is about $75k the last time I checked (a couple years ago). A lot of people around the world can afford that. They should let people purchase duplicates all their data (that they are allowed to distribute). That has a stronger chance that all of its data isn't lost in a cyber attack or lawsuit. It may also help the world's data survive a limited-scale nuclear war (which seems more and more inevitable).

        • by AmiMoJo ( 196126 )

          They should open source their infrastructure and make it distributed so anyone can contribute mirrors. All the technology exists, but the operators just don't want to accept any help or outside input. I don't know what their problem is, and I've personally offered to help, but they just aren't interested.

          See my post above about the email from a hacker who is inside their Zendesk ticket system right now. The whole thing is a complete shit-show and don't seem to have the first clue as to how to fix it.

          It migh

      • by tlhIngan ( 30335 )

        A petabyte seems small, given what they hold.

        Maybe the wayback machine is a petabyte in size, but I suspect the main bulk of their archive is actually way more than that. Remember the Internet Archive is literally a central store for a lot of things that no longer exist, including old documentation that's been digitizied and uploaded and many more things.

        That's probably why the wayback machine is the only bit running - the main archives are still probably being restored from backups and other things.

        I think

    • Re:Limited budget? (Score:4, Interesting)

      by kmoser ( 1469707 ) on Saturday October 19, 2024 @05:41PM (#64878103)
      They store way more than just text. Images can take a ton of room. The bigger question is why doesn't Google take up the gauntlet? They have deep pockets, not to mention a literal vested interest in historic changes to web pages. I'd bet they're already storing such info, just not making it publicly available.

      It's ironic that Google recently made a change that provides links to the Wayback Machine [archive.org] for users who want to see earlier versions of a page, when Google themselves are much better positioned to serve those archived pages.
      • by dskoll ( 99328 )

        I, for one, would not trust Google.

      • The bigger question is why doesn't Google take up the gauntlet?

        Copyright law. Google would have to defend against a bunch of big dollar copyright claims because they have the kind of deep pockets that could make such a lawsuit profitable.

        • by kmoser ( 1469707 )
          It's well known that Google archives web pages. That's the entire premise of their search engine.
  • by EvilMonkeySlayer ( 826044 ) on Saturday October 19, 2024 @11:12AM (#64877433) Journal

    Unfortunately it's only part of the internet archive and not all of it. So, if you were after some old scans of obscure videogame magazines and more useful information you're out of luck I'm afraid.

    Hopefully the idiots who did this get a new arsehole ripped in them. Their excuse for the DDOS etc was one of the most stupid excuses I have ever heard.

    • Re:Only partially (Score:4, Informative)

      by caseih ( 160668 ) on Saturday October 19, 2024 @02:31PM (#64877839)

      They believe most of the data is still intact. Lots of old software in their archive (Apple II, MS-DOS, etc) that was pretty neat and could run in an emulator in the browser. And lots of old TV shows that folks had archived. Would be a tremendous loss if it's gone. I really hope it's not.

      I agree with your assessment of who did this. About the same stupidity as those who vandalize works of art in museums.

    • by AmiMoJo ( 196126 )

      Even more unfortunately, hackers are still inside their systems and they have not taken even the first steps towards kicking them out.

      I got the following email this morning:

      It's dispiriting to see that even after being made aware of the breach 2 weeks ago, IA has still not done the due diligence of rotating many of the API keys that were exposed in their gitlab secrets.

      As demonstrated by this message, this includes a Zendesk token with perms to access 800K+ support tickets sent to info@archive.org since 2018.

      Whether you were trying to ask a general question, or requesting the removal of your site from the Wayback Machineâ"your data is now in the hands of some random guy. If not me, it'd be someone else.

      Here's hoping that they'll get their shit together now.

      It's amateur hour over there.

  • How much the availability, safety and variety of torrenting is greatly lacking. Archive, however well meaning is a prime example of organizations centralizing information for the sake of becoming institutionalized to get the grants, build dependents and also be a silent partner of the copyright police through analytics. Now that my habit has been broken, I've added the site to my blocklist to see how long I can hold out.
    • How the fuck does the Archive "centralize" anything? It's literally a web scrapper that accepts third party manual submissions. The stuff they get today is able to be obtained elsewhere by definition. The stuff "only" they have today is that way simply because no-one else bothered to preserve it publicly. It's not like they created the work and decided to hoard it to themselves. They literately offer it to anyone who visits their site from the comfort of butt nakedness on their own personal toilet where eve
  • Thank you Archive!

Programming is an unnatural act.

Working...