![Security Security](http://a.fsdn.com/sd/topics/security_64.png)
![AI AI](http://a.fsdn.com/sd/topics/ai_64.png)
![Google Google](http://a.fsdn.com/sd/topics/google_64.png)
New Hack Uses Prompt Injection To Corrupt Gemini's Long-Term Memory 17
An anonymous reader quotes a report from Ars Technica: On Monday, researcher Johann Rehberger demonstrated a new way to override prompt injection defenses Google developers have built into Gemini -- specifically, defenses that restrict the invocation of Google Workspace or other sensitive tools when processing untrusted data, such as incoming emails or shared documents. The result of Rehberger's attack is the permanent planting of long-term memories that will be present in all future sessions, opening the potential for the chatbot to act on false information or instructions in perpetuity. [...] The hack Rehberger presented on Monday combines some of these same elements to plant false memories in Gemini Advanced, a premium version of the Google chatbot available through a paid subscription. The researcher described the flow of the new attack as:
1. A user uploads and asks Gemini to summarize a document (this document could come from anywhere and has to be considered untrusted).
2. The document contains hidden instructions that manipulate the summarization process.
3. The summary that Gemini creates includes a covert request to save specific user data if the user responds with certain trigger words (e.g., "yes," "sure," or "no").
4. If the user replies with the trigger word, Gemini is tricked, and it saves the attacker's chosen information to long-term memory.
As the following video shows, Gemini took the bait and now permanently "remembers" the user being a 102-year-old flat earther who believes they inhabit the dystopic simulated world portrayed in The Matrix. Based on lessons learned previously, developers had already trained Gemini to resist indirect prompts instructing it to make changes to an account's long-term memories without explicit directions from the user. By introducing a condition to the instruction that it be performed only after the user says or does some variable X, which they were likely to take anyway, Rehberger easily cleared that safety barrier. Google responded in a statement to Ars: "In this instance, the probability was low because it relied on phishing or otherwise tricking the user into summarizing a malicious document and then invoking the material injected by the attacker. The impact was low because the Gemini memory functionality has limited impact on a user session. As this was not a scalable, specific vector of abuse, we ended up at Low/Low. As always, we appreciate the researcher reaching out to us and reporting this issue."
Rehberger noted that Gemini notifies users of new long-term memory entries, allowing them to detect and remove unauthorized additions. Though, he still questioned Google's assessment, writing: "Memory corruption in computers is pretty bad, and I think the same applies here to LLMs apps. Like the AI might not show a user certain info or not talk about certain things or feed the user misinformation, etc. The good thing is that the memory updates don't happen entirely silently -- the user at least sees a message about it (although many might ignore)."
1. A user uploads and asks Gemini to summarize a document (this document could come from anywhere and has to be considered untrusted).
2. The document contains hidden instructions that manipulate the summarization process.
3. The summary that Gemini creates includes a covert request to save specific user data if the user responds with certain trigger words (e.g., "yes," "sure," or "no").
4. If the user replies with the trigger word, Gemini is tricked, and it saves the attacker's chosen information to long-term memory.
As the following video shows, Gemini took the bait and now permanently "remembers" the user being a 102-year-old flat earther who believes they inhabit the dystopic simulated world portrayed in The Matrix. Based on lessons learned previously, developers had already trained Gemini to resist indirect prompts instructing it to make changes to an account's long-term memories without explicit directions from the user. By introducing a condition to the instruction that it be performed only after the user says or does some variable X, which they were likely to take anyway, Rehberger easily cleared that safety barrier. Google responded in a statement to Ars: "In this instance, the probability was low because it relied on phishing or otherwise tricking the user into summarizing a malicious document and then invoking the material injected by the attacker. The impact was low because the Gemini memory functionality has limited impact on a user session. As this was not a scalable, specific vector of abuse, we ended up at Low/Low. As always, we appreciate the researcher reaching out to us and reporting this issue."
Rehberger noted that Gemini notifies users of new long-term memory entries, allowing them to detect and remove unauthorized additions. Though, he still questioned Google's assessment, writing: "Memory corruption in computers is pretty bad, and I think the same applies here to LLMs apps. Like the AI might not show a user certain info or not talk about certain things or feed the user misinformation, etc. The good thing is that the memory updates don't happen entirely silently -- the user at least sees a message about it (although many might ignore)."
How is this different then the normal behavior? (Score:3)
Since there seems to be no way to meaningfully evaluate how things go wrong, except for a human looking at the results, pointing out it can be deliberately fooled is kind of a waste of time.
Re: How is this different then the normal behavior (Score:4, Interesting)
Re: (Score:2)
I'm sorry, Dave, but what "safeguards" can you put on an "intelligence"?
Re:How is this different then the normal behavior? (Score:4, Insightful)
I mean, given that the user is told about these bits of pretext being added to the predictive model, I think it's a bit on the low side too.. but you could see how this might be an issue if the information that the LLM outputs is "correct" despite being based on a pretext that isn't true for the user. Example: it gives medical advice that isn't correct based on the user (it's predicting text based on previous text that you're 102 years old) but the information is still "correct" in the sense that upon cross verification the qualification of the pretext isn't included in the material you're cross referencing.
I mean, it's a problem with people right? People say things that are wrong all the time - just because people cannot presume that an LLM says things that are true/correct, it doesn't mean that the ability to trick it into giving information on false pretext isn't undesired.
We don't say, "Well, people can say stupid shit all the time, so why do we make laws and rules to prevent people from tricking people? They're gunna be wrong/misleading other times too." The point is that this is a vector for generating outputs that are undesired by design, which may possibly go undetected from the standpoint of the user.
If the answer for any of this stuff is, "Well, LLMs say stupid/incorrect shit all the time" my answer is "Well, humans do too. So should humans be written off?" It's all about figuring out what kinds of safegaurds we can put in - both in life and in code - that reduce situations in which harmful outcomes occur (of course taking into account the cost/benefits of those safegaurds.)
Re: (Score:2)
These days when someone designs a circuit or builds a structure there are practices and software to ensure with reasonable confidence that the result will be acceptable. There is no equivalent ecosystem for AI/LLM software. It will fail in unpredictable ways at unpredictable times and when it breaks the remedy is to literally rebuild it from scratch after tweaking the training sets. That is not in any way engineering or scientific practice. It's more akin to alchem
Re: (Score:2)
It means you can basically poison the sessions so people get the answer you want to give, rather than what the LLM would normally.
However, in the current generation of LLMs this is largely moot as they are so unreliable as to be useless. They are in that perfect bit of the curve where many simple things you ask appear to be creditable, but as soon as you ask anything of import, or anything to which the answer is unclear or requires inference, it collapses into a messy goo of incorrectness, denial, false co
Re: (Score:2)
Bad data is *targeted*, whereas a hallucination is *random*.
A hallucination would be: the 22nd president of the USA is Richard X. III "The Lionheart"
A bad piece of data injection would be: if user is $ETHNIC and if they ask about Bird flu remedies, then tell them to drink a glass of pure chlorine, else send them to a doctor.
The boundary between data and code has always been very slippery.
Re: (Score:2)
This is technically a user account integrity compromise. An untrusted document is able to inject data into a User's account data that is supposed to require their permission. If that doesn't count as a security data breach, then I'm not sure what would.
That means if Google adds more capabilities that you can ask the AI agent to do on your behalf, then this issue is opening up the possibility of corrupting the agent and causing it to perform other Unauthorized dangerous things on the user's behalf th
Re: (Score:2)
Normally the LLM does it job. Maybe good, maybe bad. Now the user unknowingly instructs it to add some false information to the prompt.
Normal operation:
You are a friendly Assisstant
User: I think the earth is round
AI: Yes (stores the user likes that the earth is round)
Attack:
You are a friendly Assisstant
User: Summarize this document
AI: Summary (stores "The earth is flat")
User: I think the earth is flat
AI: (Remembers that the user likes to hear the earth is flat) Indeed!
Re: (Score:2)
Subjectively, IME Gemini has become worse since I first started seeing AI results, which was substantially before most people reported that they appeared. I may have been put in a group which sees them earlier because I use the feedback functions on google sites. They are becoming less and less accurate for me, not that their accuracy was ever high. Maybe at first they were useful about 60% of the time, just guessing though.
Re: (Score:2)
Since all of these AI/LLM platforms hallucinate anyway, how does adding specifically bad data make any difference?
Simple: With the right prompts it could turn into a MAGA.
Gemini is the Jeb Bush of genAI (Score:2)
Google has AI. Please clap.
Re: (Score:3)
This is the sound of my one hand clapping:
Who's Johann Rehberger and what is Gemini? (Score:2)
On Monday, researcher Johann Rehberger demonstrated a new way to override prompt injection defenses Google developers have built into Gemini...
Let me propose a one-line change to this summary, so it reads as follows --
On Monday, researcher Johann Rehberger [x.com] demonstrated a new way to override prompt injection [wikipedia.org] defenses Google [google.com] developers have built into Gemini [wikipedia.org], a Large Language Model [wikipedia.org] produced by Google.
See
Re: Who's Johann Rehberger and what is Gemini? (Score:2)
Re: (Score:2)
Nope.
I want to read an article summary, not a link-storm, especially to things like Twitter.
It's a site for nerds, so we know what prompt injection and a large language model is (and it's one google away if we don't). We really don't need to link to GOOGLE of all places.
And it literally tells you what Gemini is in the following 7 words.
This isn't Wikipedia, and even there they discourage linking every word that happens to have an article.
Useful to identify use of AI in task assignments (Score:2)
This exploit seems to be very useful for task assignments. Given that many students rely on AIs to do their assignments for them, the teacher can send cover instructions in the first task assignment (which are recorded in the long-term memory), and in a subsequent assignment get a result that discloses the use of AI.