![Security Security](http://a.fsdn.com/sd/topics/security_64.png)
![AI AI](http://a.fsdn.com/sd/topics/ai_64.png)
![Google Google](http://a.fsdn.com/sd/topics/google_64.png)
New Hack Uses Prompt Injection To Corrupt Gemini's Long-Term Memory 8
An anonymous reader quotes a report from Ars Technica: On Monday, researcher Johann Rehberger demonstrated a new way to override prompt injection defenses Google developers have built into Gemini -- specifically, defenses that restrict the invocation of Google Workspace or other sensitive tools when processing untrusted data, such as incoming emails or shared documents. The result of Rehberger's attack is the permanent planting of long-term memories that will be present in all future sessions, opening the potential for the chatbot to act on false information or instructions in perpetuity. [...] The hack Rehberger presented on Monday combines some of these same elements to plant false memories in Gemini Advanced, a premium version of the Google chatbot available through a paid subscription. The researcher described the flow of the new attack as:
1. A user uploads and asks Gemini to summarize a document (this document could come from anywhere and has to be considered untrusted).
2. The document contains hidden instructions that manipulate the summarization process.
3. The summary that Gemini creates includes a covert request to save specific user data if the user responds with certain trigger words (e.g., "yes," "sure," or "no").
4. If the user replies with the trigger word, Gemini is tricked, and it saves the attacker's chosen information to long-term memory.
As the following video shows, Gemini took the bait and now permanently "remembers" the user being a 102-year-old flat earther who believes they inhabit the dystopic simulated world portrayed in The Matrix. Based on lessons learned previously, developers had already trained Gemini to resist indirect prompts instructing it to make changes to an account's long-term memories without explicit directions from the user. By introducing a condition to the instruction that it be performed only after the user says or does some variable X, which they were likely to take anyway, Rehberger easily cleared that safety barrier. Google responded in a statement to Ars: "In this instance, the probability was low because it relied on phishing or otherwise tricking the user into summarizing a malicious document and then invoking the material injected by the attacker. The impact was low because the Gemini memory functionality has limited impact on a user session. As this was not a scalable, specific vector of abuse, we ended up at Low/Low. As always, we appreciate the researcher reaching out to us and reporting this issue."
Rehberger noted that Gemini notifies users of new long-term memory entries, allowing them to detect and remove unauthorized additions. Though, he still questioned Google's assessment, writing: "Memory corruption in computers is pretty bad, and I think the same applies here to LLMs apps. Like the AI might not show a user certain info or not talk about certain things or feed the user misinformation, etc. The good thing is that the memory updates don't happen entirely silently -- the user at least sees a message about it (although many might ignore)."
1. A user uploads and asks Gemini to summarize a document (this document could come from anywhere and has to be considered untrusted).
2. The document contains hidden instructions that manipulate the summarization process.
3. The summary that Gemini creates includes a covert request to save specific user data if the user responds with certain trigger words (e.g., "yes," "sure," or "no").
4. If the user replies with the trigger word, Gemini is tricked, and it saves the attacker's chosen information to long-term memory.
As the following video shows, Gemini took the bait and now permanently "remembers" the user being a 102-year-old flat earther who believes they inhabit the dystopic simulated world portrayed in The Matrix. Based on lessons learned previously, developers had already trained Gemini to resist indirect prompts instructing it to make changes to an account's long-term memories without explicit directions from the user. By introducing a condition to the instruction that it be performed only after the user says or does some variable X, which they were likely to take anyway, Rehberger easily cleared that safety barrier. Google responded in a statement to Ars: "In this instance, the probability was low because it relied on phishing or otherwise tricking the user into summarizing a malicious document and then invoking the material injected by the attacker. The impact was low because the Gemini memory functionality has limited impact on a user session. As this was not a scalable, specific vector of abuse, we ended up at Low/Low. As always, we appreciate the researcher reaching out to us and reporting this issue."
Rehberger noted that Gemini notifies users of new long-term memory entries, allowing them to detect and remove unauthorized additions. Though, he still questioned Google's assessment, writing: "Memory corruption in computers is pretty bad, and I think the same applies here to LLMs apps. Like the AI might not show a user certain info or not talk about certain things or feed the user misinformation, etc. The good thing is that the memory updates don't happen entirely silently -- the user at least sees a message about it (although many might ignore)."
How is this different then the normal behavior? (Score:2)
Since there seems to be no way to meaningfully evaluate how things go wrong, except for a human looking at the results, pointing out it can be deliberately fooled is kind of a waste of time.
Re: How is this different then the normal behavior (Score:3)
Re: (Score:2)
I'm sorry, Dave, but what "safeguards" can you put on an "intelligence"?
Re: (Score:3)
I mean, given that the user is told about these bits of pretext being added to the predictive model, I think it's a bit on the low side too.. but you could see how this might be an issue if the information that the LLM outputs is "correct" despite being based on a pretext that isn't true for the user. Example: it gives medical advice that isn't correct based on the user (it's predicting text based on previous text that you're 102 years old) but the information is still "correct" in the sense that upon cros
Gemini is the Jeb Bush of genAI (Score:2)
Google has AI. Please clap.
Re: (Score:2)
This is the sound of my one hand clapping:
Who's Johann Rehberger and what is Gemini? (Score:2)
On Monday, researcher Johann Rehberger demonstrated a new way to override prompt injection defenses Google developers have built into Gemini...
Let me propose a one-line change to this summary, so it reads as follows --
On Monday, researcher Johann Rehberger [x.com] demonstrated a new way to override prompt injection [wikipedia.org] defenses Google [google.com] developers have built into Gemini [wikipedia.org], a Large Language Model [wikipedia.org] produced by Google.
See
Re: Who's Johann Rehberger and what is Gemini? (Score:2)