Researchers Create AI Worms That Can Spread From One System to Another (arstechnica.com) 46
Long-time Slashdot reader Greymane shared this article from Wired:
[I]n a demonstration of the risks of connected, autonomous AI ecosystems, a group of researchers has created one of what they claim are the first generative AI worms — which can spread from one system to another, potentially stealing data or deploying malware in the process. "It basically means that now you have the ability to conduct or to perform a new kind of cyberattack that hasn't been seen before," says Ben Nassi, a Cornell Tech researcher behind the research. Nassi, along with fellow researchers Stav Cohen and Ron Bitton, created the worm, dubbed Morris II, as a nod to the original Morris computer worm that caused chaos across the Internet in 1988. In a research paper and website shared exclusively with WIRED, the researchers show how the AI worm can attack a generative AI email assistant to steal data from emails and send spam messages — breaking some security protections in ChatGPT and Gemini in the process...in test environments [and not against a publicly available email assistant]...
To create the generative AI worm, the researchers turned to a so-called "adversarial self-replicating prompt." This is a prompt that triggers the generative AI model to output, in its response, another prompt, the researchers say. In short, the AI system is told to produce a set of further instructions in its replies... To show how the worm can work, the researchers created an email system that could send and receive messages using generative AI, plugging into ChatGPT, Gemini, and open source LLM, LLaVA. They then found two ways to exploit the system — by using a text-based self-replicating prompt and by embedding a self-replicating prompt within an image file.
In one instance, the researchers, acting as attackers, wrote an email including the adversarial text prompt, which "poisons" the database of an email assistant using retrieval-augmented generation (RAG), a way for LLMs to pull in extra data from outside its system. When the email is retrieved by the RAG, in response to a user query, and is sent to GPT-4 or Gemini Pro to create an answer, it "jailbreaks the GenAI service" and ultimately steals data from the emails, Nassi says. "The generated response containing the sensitive user data later infects new hosts when it is used to reply to an email sent to a new client and then stored in the database of the new client," Nassi says. In the second method, the researchers say, an image with a malicious prompt embedded makes the email assistant forward the message on to others. "By encoding the self-replicating prompt into the image, any kind of image containing spam, abuse material, or even propaganda can be forwarded further to new clients after the initial email has been sent," Nassi says.
In a video demonstrating the research, the email system can be seen forwarding a message multiple times. The researchers also say they could extract data from emails. "It can be names, it can be telephone numbers, credit card numbers, SSN, anything that is considered confidential," Nassi says.
The researchers reported their findings to Google and OpenAI, according to the article, with OpenAI confirming "They appear to have found a way to exploit prompt-injection type vulnerabilities by relying on user input that hasn't been checked or filtered." OpenAI says they're now working to make their systems "more resilient."
Google declined to comment on the research.
To create the generative AI worm, the researchers turned to a so-called "adversarial self-replicating prompt." This is a prompt that triggers the generative AI model to output, in its response, another prompt, the researchers say. In short, the AI system is told to produce a set of further instructions in its replies... To show how the worm can work, the researchers created an email system that could send and receive messages using generative AI, plugging into ChatGPT, Gemini, and open source LLM, LLaVA. They then found two ways to exploit the system — by using a text-based self-replicating prompt and by embedding a self-replicating prompt within an image file.
In one instance, the researchers, acting as attackers, wrote an email including the adversarial text prompt, which "poisons" the database of an email assistant using retrieval-augmented generation (RAG), a way for LLMs to pull in extra data from outside its system. When the email is retrieved by the RAG, in response to a user query, and is sent to GPT-4 or Gemini Pro to create an answer, it "jailbreaks the GenAI service" and ultimately steals data from the emails, Nassi says. "The generated response containing the sensitive user data later infects new hosts when it is used to reply to an email sent to a new client and then stored in the database of the new client," Nassi says. In the second method, the researchers say, an image with a malicious prompt embedded makes the email assistant forward the message on to others. "By encoding the self-replicating prompt into the image, any kind of image containing spam, abuse material, or even propaganda can be forwarded further to new clients after the initial email has been sent," Nassi says.
In a video demonstrating the research, the email system can be seen forwarding a message multiple times. The researchers also say they could extract data from emails. "It can be names, it can be telephone numbers, credit card numbers, SSN, anything that is considered confidential," Nassi says.
The researchers reported their findings to Google and OpenAI, according to the article, with OpenAI confirming "They appear to have found a way to exploit prompt-injection type vulnerabilities by relying on user input that hasn't been checked or filtered." OpenAI says they're now working to make their systems "more resilient."
Google declined to comment on the research.
So wait. (Score:3)
Wouldn't one AI system be able to catch the other one trying to infect it with a virus or worm?
Re: (Score:3)
The current crop of AI systems can't reliably catch attempts to make child porn images. Why would you think they would be any better at blocking the prompt injection attacks described in the summary?
Re: (Score:1)
Re:So wait. (Score:5, Insightful)
Wouldn't one AI system be able to catch the other one trying to infect it with a virus or worm?
We'll find out, won't we?
My guess is no, the first few times this is tried, it'll succeed like gangbusters. Then code will be written to prevent this, and so on and so on. Just like the virus/malware arms race.
Anyway, my take-away is that shit is gonna get fucked up in ways we didn't foresee.
Re: (Score:3)
Re: (Score:2)
Doesn't work so well with STDs. Just saying.
Re: (Score:3)
Why would it be able to do that? Doing so would require understanding of what malware is and why you do not want to catch it. "AI" does not have understanding.
Re: So wait. (Score:2)
I wish them luck (Score:2)
Protecting against attack in a system humans don't fully understand is by definition impossible.
Re: (Score:3)
Actually, no. Unless you mean _reliably_ protecting against it. But then, we cannot reliably protect systems against very conventional malware either.
The Web is Awesome...! (Score:1)
Cool research (Score:4, Insightful)
That did not take long. Shows how utterly foolish using current AI to automate things would be.
Re: (Score:3)
Shows how utterly foolish using current AI to automate things would be.
Perhaps we should ask ChatGPT how to make automation processes better. It can reach out to other LLMs and get their answers and incorporate them into its own.
Oh wait.
Re: Cool research (Score:2)
Re: (Score:2)
Well, generally no at this time, but there are always a few morons and greedy assholes that cannot wait. Hence you have a point.
Re: Cool research (Score:3)
It's just a prompt injection attack against the dumbest possible method of automation. Their contrived email assistant appends incoming mail to a prompt and straight executes it. On each subsequent email their example retrieves the conversation history and simply appends to the prompt... duh... prompt injection. Combined with the application acting on output from a generative AI with no controls in place. Why? The self replicating part is pure clickbait.
You can't possibly automate anything with generative A
Re: (Score:2)
This is how proper security research works: First you do and publish a simple attack that shows the idea has merit. Then more researchers become interested and attacks get refined. This can go to extreme levels of sophistication and has done so in the past. (For a somewhat current example, look at "return oriented programming", where you inject code without actually injecting code.) Unless the countermeasures fall flat (always a possibility), we will see that here as well. If the countermeasures fall flat t
Welcome to the future (Score:2)
Re: (Score:2)
I think you are understating things rather strongly.
First year compsci project (Score:3)
One of the requirements our professional gave us was to accept user generated input without breaking or doing something stupid.
All input had to be filtered for safety no matter how psychotic the input. Anyone's program that did stupid shit or crashed with malicious or unexpected input was automatically dropped a full letter grade. No excuses.
Apparently the OpenAI and Google devs didn't learn this lesson.
Re: (Score:2)
Professor *
Re: First year compsci project (Score:3)
Re: (Score:2)
> This is a prompt that triggers the generative AI model to output, in its response, another prompt, the researchers say. In short, the AI system is told to produce a set of further instructions in its replies
The malicious prompts are meta queries directed to the AI itself instructing it to behave in an alternate way from its base programming.
Disallow the entire class of meta queries. Do not allow alternative prompts and other self references.
Only allow queries about data in the LKM's trained data set.
Re: First year compsci project (Score:2)
Re: (Score:2)
So it can interpret questions for meaning but not interpret questions for meaning. Ok.
Tell us how it does work. I assume you're a senior dev on some LLM project or an AI based PhD.
Re: First year compsci project (Score:3)
Re: (Score:2)
So, you really don't know how it works. That's ok.
Re: First year compsci project (Score:2)
Re: (Score:2)
So back to where we were originally. Meta queries need to be filtered.
We've already seen many instances where they are able to do things like eliminate white people from photos or provide positive views of one political view and negative of others and so on. They can filter out any targeted class of questions for elimination or reinterpretation.
But they haven't because they have no real world experience dealing with real users. This is all theoretical to their engineering leaders. I'll bet none has suc
Re: First year compsci project (Score:2)
Re: (Score:2)
Several of them are quite intentional. I don't need to assume. But intentional or not is irrelevant. The fact is anyone can experience that reality themselves by just querying the public facing LLMs.
They absolute can and do filter queries by category. This is well understood public knowledge. It is black box to you, not to the man behind the curtain. These systems are not magical.
I'm done here. You started something on a topic you clearly don't understand at either the technology level nor the publi
Re: (Score:2)
But that filtering can never be enough; it can never be a perfect solution. As we've seen with current LLMs, because the model is a black box of weights it's impossible to perfectly fix the problem, and people jail-break the LLMs all the time.
Re: (Score:2)
Life has no guarantees, nothing is perfect. But they don't even try.
They could do a -lot- better than zero effort.
For comparison: we have firewalls, anti-virus software, we train staff not to open suspicious emails, we have password complexity requirements, we do all sorts of shit yet still get hacked every day. Because those measures fail vs determined attackers should we not bother at all? That seems to be what you're saying about better AI filtering.
Re: (Score:2)
We can't even stop the LLMs from constantly LYING ("hallucinating"). We can shove some (very imperfect) guardrails on it, but don't expect that to suddenly make them controllable/safe or something. They are what they are: both useful, and not to be trusted. The news sites that publish bad AI-authored articles without running them by the very critical eye of a knowledgeable editor have given us some recent abject lessons on that.
Re: (Score:2)
Not sure why you are saying that, he explained it quite factually. Everything he said is correct.
The LLMs are not applying "logic", they are token-predicting. It *looks* like they are applying logic, but they aren't.
I'm not a Phd, but I did program a working toy version of an LLM using byte-pair tokens, and I've worked with AI for over 30 years, including working with Neural Networks (both with libraries and dev systems, and from scratch implementations in C++). He was correct in saying "that's not how this
Re: (Score:2)
Because he literally hand waved it away as "it's a neural network and it does stuff".
I continued to explain my position several more times in detail with examples later in that thread.
Re: First year compsci project (Score:2)
They could've called it "P-1" (Score:2)
Thomas Ryan should get more credit than Robert Morris.
Re: (Score:2)
Or from just before "The Adolescence of P-1", there was John Brunner's "The Shockwave Rider".
Both of them nailed the idea of migratory programs roaming a network and adding data to themselves.
Ryan's P-1 got the AI part as right as he could have in 1977.
Brunner's "tapeworms" were just really clever coding - the smarts were in the protagonist programmer, which I guess is why the ultimate tapeworm didn't have a name.
embedding a self-replicating prompt within an imag (Score:2)
>embedding a self-replicating prompt within an image file
Really? I haven't heard of this possibility since Microsoft's .WMV format, Windows Meta-Virus. How is it that pictures are executable?
Re: embedding a self-replicating prompt within an (Score:2)
Perfect timing (Score:2)
Bias Analysis and sensitive leak mining (Score:2)
Re: (Score:2)
Forget AI recursive scripts. Revealing 'Social/Woke' filtering and accurately eroding it at the edges is possible.
Easy peasy, they just have to filter out words and phrases like Negger, Nogger, Nugger, Sam Bow...