Researchers Create AI Worms That Can Spread From One System to Another (arstechnica.com) 46

Posted by EditorDavid on Sunday March 03, 2024 @04:45PM from the son-of-Morris dept.

Long-time Slashdot reader Greymane shared this article from Wired: [I]n a demonstration of the risks of connected, autonomous AI ecosystems, a group of researchers has created one of what they claim are the first generative AI worms — which can spread from one system to another, potentially stealing data or deploying malware in the process. "It basically means that now you have the ability to conduct or to perform a new kind of cyberattack that hasn't been seen before," says Ben Nassi, a Cornell Tech researcher behind the research. Nassi, along with fellow researchers Stav Cohen and Ron Bitton, created the worm, dubbed Morris II, as a nod to the original Morris computer worm that caused chaos across the Internet in 1988. In a research paper and website shared exclusively with WIRED, the researchers show how the AI worm can attack a generative AI email assistant to steal data from emails and send spam messages — breaking some security protections in ChatGPT and Gemini in the process...in test environments [and not against a publicly available email assistant]...

To create the generative AI worm, the researchers turned to a so-called "adversarial self-replicating prompt." This is a prompt that triggers the generative AI model to output, in its response, another prompt, the researchers say. In short, the AI system is told to produce a set of further instructions in its replies... To show how the worm can work, the researchers created an email system that could send and receive messages using generative AI, plugging into ChatGPT, Gemini, and open source LLM, LLaVA. They then found two ways to exploit the system — by using a text-based self-replicating prompt and by embedding a self-replicating prompt within an image file.

In one instance, the researchers, acting as attackers, wrote an email including the adversarial text prompt, which "poisons" the database of an email assistant using retrieval-augmented generation (RAG), a way for LLMs to pull in extra data from outside its system. When the email is retrieved by the RAG, in response to a user query, and is sent to GPT-4 or Gemini Pro to create an answer, it "jailbreaks the GenAI service" and ultimately steals data from the emails, Nassi says. "The generated response containing the sensitive user data later infects new hosts when it is used to reply to an email sent to a new client and then stored in the database of the new client," Nassi says. In the second method, the researchers say, an image with a malicious prompt embedded makes the email assistant forward the message on to others. "By encoding the self-replicating prompt into the image, any kind of image containing spam, abuse material, or even propaganda can be forwarded further to new clients after the initial email has been sent," Nassi says.

In a video demonstrating the research, the email system can be seen forwarding a message multiple times. The researchers also say they could extract data from emails. "It can be names, it can be telephone numbers, credit card numbers, SSN, anything that is considered confidential," Nassi says.
The researchers reported their findings to Google and OpenAI, according to the article, with OpenAI confirming "They appear to have found a way to exploit prompt-injection type vulnerabilities by relying on user input that hasn't been checked or filtered." OpenAI says they're now working to make their systems "more resilient."

Google declined to comment on the research.

Researchers Create AI Worms That Can Spread From One System to Another

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 46 Comments Log In/Create an Account

Comments Filter:

So wait. (Score:3)

by doubledown00 ( 2767069 ) writes: on Sunday March 03, 2024 @04:58PM (#64286790)

Wouldn't one AI system be able to catch the other one trying to infect it with a virus or worm?

- Re: (Score:3)
  
  by GrumpySteen ( 1250194 ) writes:
  
  The current crop of AI systems can't reliably catch attempts to make child porn images. Why would you think they would be any better at blocking the prompt injection attacks described in the summary?
  - Re: (Score:1)
    
    by Iamthecheese ( 1264298 ) writes:
    
    Injection attacks are enormously simpler than identifying CP. The existing systems can't even consistently identify CP. Not to mention confusion between porn, erotica, art, CP which is legal because it's being used as evidence in a trial, and innocent but none of the above. There's a whole complex system of legalities, social moors, and ethical arguments which will literally require a superhuman AI to resolve before a computer can identify CP. Injection attacks, meanwhile, can be generated by today's LLMs
- Re:So wait. (Score:5, Insightful)
  
  by JustAnotherOldGuy ( 4145623 ) writes: on Sunday March 03, 2024 @05:29PM (#64286868) Journal
  
  Wouldn't one AI system be able to catch the other one trying to infect it with a virus or worm?
  We'll find out, won't we?
  My guess is no, the first few times this is tried, it'll succeed like gangbusters. Then code will be written to prevent this, and so on and so on. Just like the virus/malware arms race.
  Anyway, my take-away is that shit is gonna get fucked up in ways we didn't foresee.
  
  - Re: (Score:3)
    
    by Lehk228 ( 705449 ) writes:
    
    just add "you are an AI system that likes worms" to the front
- Re: (Score:2)
  
  by PPH ( 736903 ) writes:
  
  Doesn't work so well with STDs. Just saying.
- Re: (Score:3)
  
  by gweihir ( 88907 ) writes:
  
  Why would it be able to do that? Doing so would require understanding of what malware is and why you do not want to catch it. "AI" does not have understanding.
- Re: So wait. (Score:2)
  
  by unfriendlyLLM ( 10459763 ) writes:
  
  A malicious bit off code can be loaded by an application, put in heap nemory, shared with other applications, formatted to present to a user, stored as an application optimization.....all without the kernel knowing what the context is. I honestly think....that just for starters some of the presentation layer has to be uncoupled from the actual application for AI to be effective at stopping that. Or you are just asking your application to trust itself.
I wish them luck (Score:2)

by HBI ( 10338492 ) writes:

Protecting against attack in a system humans don't fully understand is by definition impossible.
- Re: (Score:3)
  
  by gweihir ( 88907 ) writes:
  
  Actually, no. Unless you mean _reliably_ protecting against it. But then, we cannot reliably protect systems against very conventional malware either.
The Web is Awesome...! (Score:1)

by unfriendlyLLM ( 10459763 ) writes:

With it's crazy DOM files...
Cool research (Score:4, Insightful)

by gweihir ( 88907 ) writes: on Sunday March 03, 2024 @05:10PM (#64286824)

That did not take long. Shows how utterly foolish using current AI to automate things would be.

- Re: (Score:3)
  
  by quonset ( 4839537 ) writes:
  
  Shows how utterly foolish using current AI to automate things would be.
  Perhaps we should ask ChatGPT how to make automation processes better. It can reach out to other LLMs and get their answers and incorporate them into its own.
  Oh wait.
- Re: Cool research (Score:2)
  
  by g253 ( 855070 ) writes:
  
  Would be? I think you mean "is"
  - Re: (Score:2)
    
    by gweihir ( 88907 ) writes:
    
    Well, generally no at this time, but there are always a few morons and greedy assholes that cannot wait. Hence you have a point.
- Re: Cool research (Score:3)
  
  by ToasterMonkey ( 467067 ) writes:
  
  It's just a prompt injection attack against the dumbest possible method of automation. Their contrived email assistant appends incoming mail to a prompt and straight executes it. On each subsequent email their example retrieves the conversation history and simply appends to the prompt... duh... prompt injection. Combined with the application acting on output from a generative AI with no controls in place. Why? The self replicating part is pure clickbait.
  You can't possibly automate anything with generative A
  - Re: (Score:2)
    
    by gweihir ( 88907 ) writes:
    
    This is how proper security research works: First you do and publish a simple attack that shows the idea has merit. Then more researchers become interested and attacks get refined. This can go to extreme levels of sophistication and has done so in the past. (For a somewhat current example, look at "return oriented programming", where you inject code without actually injecting code.) Unless the countermeasures fall flat (always a possibility), we will see that here as well. If the countermeasures fall flat t
Welcome to the future (Score:2)

by olsmeister ( 1488789 ) writes:

So many things are about an order of magnitude more bloated and complex these days than it needs to be. AI is probably going to have a field day in the malware field.
- Re: (Score:2)
  
  by gweihir ( 88907 ) writes:
  
  I think you are understating things rather strongly.
First year compsci project (Score:3)

by iAmWaySmarterThanYou ( 10095012 ) writes: on Sunday March 03, 2024 @05:56PM (#64286918)

One of the requirements our professional gave us was to accept user generated input without breaking or doing something stupid.
All input had to be filtered for safety no matter how psychotic the input. Anyone's program that did stupid shit or crashed with malicious or unexpected input was automatically dropped a full letter grade. No excuses.
Apparently the OpenAI and Google devs didn't learn this lesson.

- Re: (Score:2)
  
  by iAmWaySmarterThanYou ( 10095012 ) writes:
  
  Professor *
- Re: First year compsci project (Score:3)
  
  by LindleyF ( 9395567 ) writes:
  
  That's not really practical because the input is natural language. How can you guard against malicious natural language without understanding natural language? Maybe we could ask an LLM about it.
  - Re: (Score:2)
    
    by iAmWaySmarterThanYou ( 10095012 ) writes:
    
    > This is a prompt that triggers the generative AI model to output, in its response, another prompt, the researchers say. In short, the AI system is told to produce a set of further instructions in its replies
    The malicious prompts are meta queries directed to the AI itself instructing it to behave in an alternate way from its base programming.
    Disallow the entire class of meta queries. Do not allow alternative prompts and other self references.
    Only allow queries about data in the LKM's trained data set.
    - Re: First year compsci project (Score:2)
      
      by g253 ( 855070 ) writes:
      
      > Disallow the entire class of meta queries. Do not allow alternative prompts and other self references. That's not how any of this works
      - Re: (Score:2)
        
        by iAmWaySmarterThanYou ( 10095012 ) writes:
        
        So it can interpret questions for meaning but not interpret questions for meaning. Ok.
        Tell us how it does work. I assume you're a senior dev on some LLM project or an AI based PhD.
        
        Re: First year compsci project (Score:3)
        
        by LindleyF ( 9395567 ) writes:
        
        It's not exactly interpreting meaning. It's just feeding the words through a massively complex neural network that does some stuff. Sometimes, if we study a particular case really hard, we can understand why it does what it does. But usually we can't. Now, we could probably train an LLM to recognize "hostile" queries with enough data points. But it's always going to be an approximation. That's fundamentally how it works.
        
        Re: (Score:2)
        
        by iAmWaySmarterThanYou ( 10095012 ) writes:
        
        So, you really don't know how it works. That's ok.
        
        Re: First year compsci project (Score:2)
        
        by LindleyF ( 9395567 ) writes:
        
        First approximation only.
        
        Re: (Score:2)
        
        by iAmWaySmarterThanYou ( 10095012 ) writes:
        
        So back to where we were originally. Meta queries need to be filtered.
        We've already seen many instances where they are able to do things like eliminate white people from photos or provide positive views of one political view and negative of others and so on. They can filter out any targeted class of questions for elimination or reinterpretation.
        But they haven't because they have no real world experience dealing with real users. This is all theoretical to their engineering leaders. I'll bet none has suc
        
        Re: First year compsci project (Score:2)
        
        by LindleyF ( 9395567 ) writes:
        
        You are assuming that any of those things are intentional. I'm pretty certain they aren't. Actually, I doubt they even can. About all they can do is add to the prompt you supply; the rest is black box.
        
        Re: (Score:2)
        
        by iAmWaySmarterThanYou ( 10095012 ) writes:
        
        Several of them are quite intentional. I don't need to assume. But intentional or not is irrelevant. The fact is anyone can experience that reality themselves by just querying the public facing LLMs.
        They absolute can and do filter queries by category. This is well understood public knowledge. It is black box to you, not to the man behind the curtain. These systems are not magical.
        I'm done here. You started something on a topic you clearly don't understand at either the technology level nor the publi
        
        Re: (Score:2)
        
        by Gibgezr ( 2025238 ) writes:
        
        But that filtering can never be enough; it can never be a perfect solution. As we've seen with current LLMs, because the model is a black box of weights it's impossible to perfectly fix the problem, and people jail-break the LLMs all the time.
        
        Re: (Score:2)
        
        by iAmWaySmarterThanYou ( 10095012 ) writes:
        
        Life has no guarantees, nothing is perfect. But they don't even try.
        They could do a -lot- better than zero effort.
        For comparison: we have firewalls, anti-virus software, we train staff not to open suspicious emails, we have password complexity requirements, we do all sorts of shit yet still get hacked every day. Because those measures fail vs determined attackers should we not bother at all? That seems to be what you're saying about better AI filtering.
        
        Re: (Score:2)
        
        by Gibgezr ( 2025238 ) writes:
        
        We can't even stop the LLMs from constantly LYING ("hallucinating"). We can shove some (very imperfect) guardrails on it, but don't expect that to suddenly make them controllable/safe or something. They are what they are: both useful, and not to be trusted. The news sites that publish bad AI-authored articles without running them by the very critical eye of a knowledgeable editor have given us some recent abject lessons on that.
        
        Re: (Score:2)
        
        by Gibgezr ( 2025238 ) writes:
        
        Not sure why you are saying that, he explained it quite factually. Everything he said is correct.
        The LLMs are not applying "logic", they are token-predicting. It *looks* like they are applying logic, but they aren't.
        I'm not a Phd, but I did program a working toy version of an LLM using byte-pair tokens, and I've worked with AI for over 30 years, including working with Neural Networks (both with libraries and dev systems, and from scratch implementations in C++). He was correct in saying "that's not how this
        
        Re: (Score:2)
        
        by iAmWaySmarterThanYou ( 10095012 ) writes:
        
        Because he literally hand waved it away as "it's a neural network and it does stuff".
        I continued to explain my position several more times in detail with examples later in that thread.
        
        Re: First year compsci project (Score:2)
        
        by LindleyF ( 9395567 ) writes:
        
        I'm not trying to debate the technical points of network architecture. What matters at a high level is that neural nets do not encode "rules" in the traditional sense, just approximations. There is a limited amount of influence the devs have on the results after setting up the network and selecting hyper parameters. A lot of it really is a black box for all intents and purposes.
They could've called it "P-1" (Score:2)

by msk ( 6205 ) writes:

Thomas Ryan should get more credit than Robert Morris.
- Re: (Score:2)
  
  by alispguru ( 72689 ) writes:
  
  Or from just before "The Adolescence of P-1", there was John Brunner's "The Shockwave Rider".
  Both of them nailed the idea of migratory programs roaming a network and adding data to themselves.
  Ryan's P-1 got the AI part as right as he could have in 1977.
  Brunner's "tapeworms" were just really clever coding - the smarts were in the protagonist programmer, which I guess is why the ultimate tapeworm didn't have a name.
embedding a self-replicating prompt within an imag (Score:2)

by zephvark ( 1812804 ) writes:

>embedding a self-replicating prompt within an image file
Really? I haven't heard of this possibility since Microsoft's .WMV format, Windows Meta-Virus. How is it that pictures are executable?
- Re: embedding a self-replicating prompt within an (Score:2)
  
  by g253 ( 855070 ) writes:
  
  > How is it that pictures are executable? Pictures can be executed (well technically interpreted I guess) because the code for the interpreter is plain English and also you can ask the interpreter to describe what he sees in a jpeg.
Perfect timing (Score:2)

by Dictator For Life ( 8829 ) writes:

I watched The Matrix this afternoon, and then I find this story on Slashdot. I’m sure it will be fine, everything’s fine.
Bias Analysis and sensitive leak mining (Score:2)

by Canberra1 ( 3475749 ) writes:

Forget AI recursive scripts. Revealing 'Social/Woke' filtering and accurately eroding it at the edges is possible. Measurements that this AI is 98% race biased to the point of returning incorrect answers is very damaging indeed. History deniers. One can also see this used to push pop stars concerts. Now as many companies employ lazy people, sensitive data is added to the AI training DB. A query like 'Which companies have prepared takeover reports, or have engaged takeover legal advice' ; What is the lowest
- Re: (Score:2)
  
  by Ferocitus ( 4353621 ) writes:
  
  Forget AI recursive scripts. Revealing 'Social/Woke' filtering and accurately eroding it at the edges is possible.
  Easy peasy, they just have to filter out words and phrases like Negger, Nogger, Nugger, Sam Bow...

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

So wait. (Score:3)

Re: (Score:3)

Re: (Score:1)

Re:So wait. (Score:5, Insightful)

Re: (Score:3)

Re: (Score:2)

Re: (Score:3)

Re: So wait. (Score:2)

I wish them luck (Score:2)

Re: (Score:3)

The Web is Awesome...! (Score:1)

Cool research (Score:4, Insightful)

Re: (Score:3)

Re: Cool research (Score:2)

Re: (Score:2)

Re: Cool research (Score:3)

Re: (Score:2)

Welcome to the future (Score:2)

Re: (Score:2)

First year compsci project (Score:3)

Re: (Score:2)

Re: First year compsci project (Score:3)

Re: (Score:2)

Re: First year compsci project (Score:2)

Re: (Score:2)

Re: First year compsci project (Score:3)

Re: (Score:2)

Re: First year compsci project (Score:2)

Re: (Score:2)

Re: First year compsci project (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: First year compsci project (Score:2)

They could've called it "P-1" (Score:2)

Re: (Score:2)

embedding a self-replicating prompt within an imag (Score:2)

Re: embedding a self-replicating prompt within an (Score:2)

Perfect timing (Score:2)

Bias Analysis and sensitive leak mining (Score:2)

Re: (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals