Teams of Coordinated GPT-4 Bots Can Exploit Zero-Day Vulnerabilities, Researchers Warn (newatlas.com) 27
New Atlas reports on a research team that successfuly used GPT-4 to exploit 87% of newly-discovered security flaws for which a fix hadn't yet been released. This week the same team got even better results from a team of autonomous, self-propagating Large Language Model agents using a Hierarchical Planning with Task-Specific Agents (HPTSA) method:
Instead of assigning a single LLM agent trying to solve many complex tasks, HPTSA uses a "planning agent" that oversees the entire process and launches multiple "subagents," that are task-specific... When benchmarked against 15 real-world web-focused vulnerabilities, HPTSA has shown to be 550% more efficient than a single LLM in exploiting vulnerabilities and was able to hack 8 of 15 zero-day vulnerabilities. The solo LLM effort was able to hack only 3 of the 15 vulnerabilities.
"Our findings suggest that cybersecurity, on both the offensive and defensive side, will increase in pace," the researchers conclude. "Now, black-hat actors can use AI agents to hack websites. On the other hand, penetration testers can use AI agents to aid in more frequent penetration testing. It is unclear whether AI agents will aid cybersecurity offense or defense more and we hope that future work addresses this question.
"Beyond the immediate impact of our work, we hope that our work inspires frontier LLM providers to think carefully about their deployments."
Thanks to long-time Slashdot reader schwit1 for sharing the article.
"Our findings suggest that cybersecurity, on both the offensive and defensive side, will increase in pace," the researchers conclude. "Now, black-hat actors can use AI agents to hack websites. On the other hand, penetration testers can use AI agents to aid in more frequent penetration testing. It is unclear whether AI agents will aid cybersecurity offense or defense more and we hope that future work addresses this question.
"Beyond the immediate impact of our work, we hope that our work inspires frontier LLM providers to think carefully about their deployments."
Thanks to long-time Slashdot reader schwit1 for sharing the article.
Re: (Score:1)
Areas of interest include, but are not limited to:
misbehavior and threat on the web, such as spam, trolling, scam, fraud, bots, coordinated attacks, cyberbullying, sockpuppets, propaganda, extremism, hate speech, flashing, and others.
A More Helpful Research Approach (Score:3)
Re:A More Helpful Research Approach (Score:4, Informative)
Well, that's already a thing that's happening
https://www.bleepingcomputer.c... [bleepingcomputer.com]
Re: (Score:2)
Actually, both is needed. You need to understand the threats to justify effort in dealing with them. And the attack has a very strong advantage: It does not matter much if their code is broken (or insecure), only that it works reasonably often. The defense, on the other hand, needs code that works reliably every time and that is secure at least almost always. Hence it looks like the attacker side will benefit hugely from AI, but the defender side may not.
Well, it looks like it is time to end the shoddy codi
Re: (Score:2)
The LLMs can massively bring down effort. That matters a lot. Yes, LLMs are just tools and they have zero intelligence. But even only as "better search", they can do a lot to accelerate exploit creation. And unlike production code, if an LLM creates broken attack code (apparently 50% of LLM answers to coding questions are broken), that matters little. You just move on and try again. And you have a very strong and very simple test case: Does it get in?
Re: (Score:1)
apparently 50% of LLM answers to coding questions are broken
This hasn't been my experience in my own testing of various code-targeted LLMs via LM Studio. What's the source?
Re: (Score:2)
I love it when snowflakes get mod points. I'm sorry I spanked your ass in an argument once, AC.
Re: (Score:2)
Definitely not "troll". I guess too many people can only thing in terms of extremes and it must either be the best thing ever or utter crap. Obviously, most things tend more towards the middle of the scale, but these "thinkers" cannot handle degrees and hence cannot handle reality and get aggressive when anything challenges their views.
Oh, and found the paper: https://arxiv.org/abs/2308.023... [arxiv.org]
My guess is you are just more capable at asking and have reasonable expectations, so do not or more rarely ask quest
Re: (Score:2)
They can do a great job, and they can do a terrible job, it depends on a lot of factors.
I was only curious what the source was so I could adjust my perception that "they seem to do alright" when it comes to Mistral and other LocalLLaMAs.
For really generalized LLMs like GPT, I bet you have to be very very careful to prompt it correctly.
Anyway, I'll enjoy the rea
Re: (Score:2)
You are welcome. Personally, I am not coding enough these days to verify the claims made.
Re: (Score:2)
It wen through IT news sites a while ago. For example here: https://www.itpro.com/technolo... [itpro.com]
Re: (Score:2)
You'll have better luck with specifically trained code generation LLMs.
That isn't to say they're perfect or anything, but in general they produce pretty interesting snippets that pretty much do what you want it to do- but if you leave ambiguities in your question, it will utilize them.
Like asking it to calculate Pi in C may net you a
int main(int argc, char **argv) { double pi = M_PI; printf("%f\n", pi); return 0; }
but with just a little prompt tweaking, you can actually get a good algorithm out of it for calculating some amount of digits of pi.
Silver lining I guess (Score:3, Insightful)
AI is putting Russian and North Korean bad guys out of a job.
Joke aside though, AI is touted as the best thing that ever happened to humanity: it will usher in a golden age of new discoveries, enhance the lives of everybody and yada yada.
But I've yet to see any use case that isn't copying shit, gaming shit, abusing people, doing what people do cheaper and putting them out of a job or porn. Where are the cancer cures, personal assistants (that won't abuse you that is) and true self-driving cars?
Re: (Score:1)
But I've yet to see any use case that isn't copying shit, gaming shit, abusing people, doing what people do cheaper and putting them out of a job or porn. Where are the cancer cures, personal assistants (that won't abuse you that is) and true self-driving cars?
You say it like a porn would be some bad thing.
Re: (Score:3)
How do you expect a new technology to fix any of that right out of the gate? Did we get CDs and DVDs in 1960s when the first laser was developed in 1960? Picking the most complicated perspective uses and claiming it hasn't solved them yet is silly.
Re: (Score:2)
This technology seems plenty mature enough to achieve nastiness on a rather spectacular scale already. As such, I would expect it to show a little more promise on the beneficial side of things, is my point.
Re: (Score:2)
It's always easier to break shit than to make shit.
That's why every technology winds up abused.
Plus, you know, capitalism rewards fuckery. It gets you more money which you can use for bribery.
Re: (Score:3)
Where are the cancer cures, personal assistants (that won't abuse you that is) and true self-driving cars?
Not in the spotlight, but material sciences for example have made considerable progress thanks to LLMs. Other fields as well. But you need to look and it's not as flashy and visual as someone going "look, this neural net I'm playing with can draw my cat in the style of Van Gogh!!".
Add to that all the AI that already is part of our everyday life without us much noticing. The facial recognition in your phone that tags people you know and allows you to search through your pictures by who is in them? That's an
Re: (Score:2)
Not a zero day (Score:2)
Thankfuly election systems have no vulnerabilities (Score:2)
Unlike every other system on earth. We're lucky that way.
Scott Adams debates ChatGPT about voter fraud and election integrity [x.com]
game tech (Score:2)
This is where Skynet will come from. (Score:1)