Internet Archive Users Start Receiving Email From 'Some Random Guy' Criticizing Unpatched Hole (bleepingcomputer.com) 18
A post shared Saturday on social media acknowledges those admins and developers at the Internet Archive working "literally round the clock... They have taken no days off this past week. They are taking none this weekend... they are working with all of their energy and considerable talent."
It describes people "working so incredibly hard... putting their all in," with a top priority of "getting the site back secure and safe".
But there's new and continuing problems, reports The Verge's weekend editor: Early this morning, I received an email from "The Internet Archive Team," replying to a message I'd sent on October 9th. Except its author doesn't seem to have been the digital archivists' support team — it was apparently written by the hackers who breached the site earlier this month and who evidently maintain some level of access to its systems.
I'm not alone. Users on the Internet Archive subreddit are reporting getting the replies, as well. Here is the message I received:
It's dispiriting to see that even after being made aware of the breach 2 weeks ago, IA has still not done the due diligence of rotating many of the API keys that were exposed in their gitlab secrets.
As demonstrated by this message, this includes a Zendesk token with perms to access 800K+ support tickets sent to info@archive.org since 2018.
Whether you were trying to ask a general question, or requesting the removal of your site from the Wayback Machine — your data is now in the hands of some random guy. If not me, it'd be someone else.
The site BleepingComputer believes they know the larger context, starting with the fact that they've also "received numerous messages from people who received replies to their old Internet Archive removal requests... The email headers in these emails also pass all DKIM, DMARC, and SPF authentication checks, proving they were sent by an authorized Zendesk server."
BleepingComputer also writes that they'd "repeatedly tried to warn the Internet Archive that their source code was stolen through a GitLab authentication token that was exposed online for almost two years."
And that "the threat actor behind the actual data breach, who contacted BleepingComputer through an intermediary to claim credit for the attack," has been frustrated by misreporting. (Specifically, they insist there were two separate attacks last week — a DDoS attack and a separate data breach for a 6.4-gigabyte database which includes email addresses for the site's 33 million users.) The threat actor told BleepingComputer that the initial breach of Internet Archive started with them finding an exposed GitLab configuration file on one of the organization's development servers, services-hls.dev.archive.org. BleepingComputer was able to confirm that this token has been exposed since at least December 2022, with it rotating multiple times since then. The threat actor says this GitLab configuration file contained an authentication token allowing them to download the Internet Archive source code. The hacker say that this source code contained additional credentials and authentication tokens, including the credentials to Internet Archive's database management system. This allowed the threat actor to download the organization's user database, further source code, and modify the site.
The threat actor claimed to have stolen 7TB of data from the Internet Archive but would not share any samples as proof. However, now we know that the stolen data also included the API access tokens for Internet Archive's Zendesk support system. BleepingComputer attempted contact the Internet Archive numerous times, as recently as on Friday, offering to share what we knew about how the breach occurred and why it was done, but we never received a response.
"The Internet Archive was not breached for political or monetary reasons," they conclude, "but simply because the threat actor could...
"While no one has publicly claimed this breach, BleepingComputer was told it was done while the threat actor was in a group chat with others, with many receiving some of the stolen data. This database is now likely being traded amongst other people in the data breach community, and we will likely see it leaked for free in the future on hacking forums like Breached."
It describes people "working so incredibly hard... putting their all in," with a top priority of "getting the site back secure and safe".
But there's new and continuing problems, reports The Verge's weekend editor: Early this morning, I received an email from "The Internet Archive Team," replying to a message I'd sent on October 9th. Except its author doesn't seem to have been the digital archivists' support team — it was apparently written by the hackers who breached the site earlier this month and who evidently maintain some level of access to its systems.
I'm not alone. Users on the Internet Archive subreddit are reporting getting the replies, as well. Here is the message I received:
It's dispiriting to see that even after being made aware of the breach 2 weeks ago, IA has still not done the due diligence of rotating many of the API keys that were exposed in their gitlab secrets.
As demonstrated by this message, this includes a Zendesk token with perms to access 800K+ support tickets sent to info@archive.org since 2018.
Whether you were trying to ask a general question, or requesting the removal of your site from the Wayback Machine — your data is now in the hands of some random guy. If not me, it'd be someone else.
The site BleepingComputer believes they know the larger context, starting with the fact that they've also "received numerous messages from people who received replies to their old Internet Archive removal requests... The email headers in these emails also pass all DKIM, DMARC, and SPF authentication checks, proving they were sent by an authorized Zendesk server."
BleepingComputer also writes that they'd "repeatedly tried to warn the Internet Archive that their source code was stolen through a GitLab authentication token that was exposed online for almost two years."
And that "the threat actor behind the actual data breach, who contacted BleepingComputer through an intermediary to claim credit for the attack," has been frustrated by misreporting. (Specifically, they insist there were two separate attacks last week — a DDoS attack and a separate data breach for a 6.4-gigabyte database which includes email addresses for the site's 33 million users.) The threat actor told BleepingComputer that the initial breach of Internet Archive started with them finding an exposed GitLab configuration file on one of the organization's development servers, services-hls.dev.archive.org. BleepingComputer was able to confirm that this token has been exposed since at least December 2022, with it rotating multiple times since then. The threat actor says this GitLab configuration file contained an authentication token allowing them to download the Internet Archive source code. The hacker say that this source code contained additional credentials and authentication tokens, including the credentials to Internet Archive's database management system. This allowed the threat actor to download the organization's user database, further source code, and modify the site.
The threat actor claimed to have stolen 7TB of data from the Internet Archive but would not share any samples as proof. However, now we know that the stolen data also included the API access tokens for Internet Archive's Zendesk support system. BleepingComputer attempted contact the Internet Archive numerous times, as recently as on Friday, offering to share what we knew about how the breach occurred and why it was done, but we never received a response.
"The Internet Archive was not breached for political or monetary reasons," they conclude, "but simply because the threat actor could...
"While no one has publicly claimed this breach, BleepingComputer was told it was done while the threat actor was in a group chat with others, with many receiving some of the stolen data. This database is now likely being traded amongst other people in the data breach community, and we will likely see it leaked for free in the future on hacking forums like Breached."
Good thing we have Anna's Archive and Libgen (Score:5, Interesting)
Much as people dump on Anna's Archive and Libgen for being "illegal", they provide a secondary source for a huge volume of valuable data which is, obviously, not entirely safe with a single originator.
What a jackass (Score:5, Insightful)
Or LinkedIn. Except it's archive.org so there isn't even any anti-corporate angle, it's just being a dumbass and shitting in the punch bowl.
Re: (Score:3)
I'm glad he did it. I got one of those emails from Internet Archive's ticket system. They hadn't even rotated their keys, which is pretty much the first step in any breach. That kind of incompetence is worrying, for all sorts of reasons.
I love what IA are doing, but they really need to get some people with a clue to help them out, and ideally open source their infrastructure so that people can help them improve it. Like Wikipedia and many other orgs do.
Random idea (Score:2)
Maybe these public repo hosting services should automatically scan every checkin for strings that look like auth tokens or other secrets, and ask the users if they really want to continue before performing a commit.
Re: (Score:1)
A DDOS attack (Score:4, Insightful)
is not a data breach. Just sayin'.
Re: (Score:2)
Nigerian prince or extended car warranty? (Score:2)
Re: (Score:3)
Who wouldn't want an extended car warranty backed by royalty?
I'm not blaming the victim (Score:4, Insightful)
... and running your own git server doesn't magically solve every possible security problem - you can still get breached, and you can still do dumb things like leave a private key in the repo. But... this drives home to me, again, that I prefer running our own on-premises git server for our work repositories.
It's not hard to run a git server. And, if you really want a gui for some reason, you can still have one.
Why was this a reminder to you about self hosting? (Score:3)
They published a file they shouldn't have, right? Then that file was used to download all their code, which contained yet more credentials.
Running their own git server would have done nothing to fix this (that I know of... would love to learn though).
Re: (Score:2)
If their own git server wasn't exposed to the public internet, the attacker wouldn't have been able to do the second part (download all their code), unless the inadvertently published file contained VPN credentials or something equally stupid.
Re: (Score:2)
I put this in in my workplace. Not for security, but to manage the absolute hash they'd made of source code control.
However when considering options - for work and for my own source code control - an external git server or third party service doesn't even begin to figure.
It's literally as simple as "offer a Windows share" and maybe "install TortoiseGit" on Windows, and even easier on Linux. I don't understand why people need Github, Gitlab, etc.
Are there build tools to scan for credentials? (Score:2)
I've never focused on security improvements, but a tool that scans for new credentials being hard coded into a code base sounds like a useful tool. Then like with most test failures, it should be hard to change the permitted number there. And a goal of saving zero credentials sounds smart.
Or would this just move those credentials somewhere else? Forcing them to hard code something in a different way that doesn't really remove the issue.
Re: (Score:1)
There is in fact, the two tools to check into are
* Trufflehog: https://github.com/trufflesecu... [github.com]
* BFG Repo Cleaner: https://rtyley.github.io/bfg-r... [github.io]
Trufflehog handles the scanning for credentials. BFG handles cleaning up the repo once a discovery is made (for instance the history clean up required when you find out a credential/key was checked in 2 years ago)
Do something (Score:2)
In the OSS world, it's easy to find all these people who think that they are special because they know something or have good ideas.
Well, unfortunately, those are easy to come by. The only thing that truly makes a difference is to offer work on fixing the problems or on adding new stuff. Ideas are cheap, code is not.
Instead of being an a$$hole about it, the person should offer help in fixing the problem. That would show skill and that they do something worthwhile instead of just shouting at clouds.