Bruce Schneier Reminds LLM Engineers About the Risks of Prompt Injection Vulnerabilities (schneier.com) 40
Security professional Bruce Schneier argues that large language models have the same vulnerability as phones in the 1970s exploited by John Draper.
"Data and control used the same channel," Schneier writes in Communications of the ACM. "That is, the commands that told the phone switch what to do were sent along the same path as voices." Other forms of prompt injection involve the LLM receiving malicious instructions in its training data. Another example hides secret commands in Web pages. Any LLM application that processes emails or Web pages is vulnerable. Attackers can embed malicious commands in images and videos, so any system that processes those is vulnerable. Any LLM application that interacts with untrusted users — think of a chatbot embedded in a website — will be vulnerable to attack. It's hard to think of an LLM application that isn't vulnerable in some way.
Individual attacks are easy to prevent once discovered and publicized, but there are an infinite number of them and no way to block them as a class. The real problem here is the same one that plagued the pre-SS7 phone network: the commingling of data and commands. As long as the data — whether it be training data, text prompts, or other input into the LLM — is mixed up with the commands that tell the LLM what to do, the system will be vulnerable. But unlike the phone system, we can't separate an LLM's data from its commands. One of the enormously powerful features of an LLM is that the data affects the code. We want the system to modify its operation when it gets new training data. We want it to change the way it works based on the commands we give it. The fact that LLMs self-modify based on their input data is a feature, not a bug. And it's the very thing that enables prompt injection.
Like the old phone system, defenses are likely to be piecemeal. We're getting better at creating LLMs that are resistant to these attacks. We're building systems that clean up inputs, both by recognizing known prompt-injection attacks and training other LLMs to try to recognize what those attacks look like. (Although now you have to secure that other LLM from prompt-injection attacks.) In some cases, we can use access-control mechanisms and other Internet security systems to limit who can access the LLM and what the LLM can do. This will limit how much we can trust them. Can you ever trust an LLM email assistant if it can be tricked into doing something it shouldn't do? Can you ever trust a generative-AI traffic-detection video system if someone can hold up a carefully worded sign and convince it to not notice a particular license plate — and then forget that it ever saw the sign...?
Someday, some AI researcher will figure out how to separate the data and control paths. Until then, though, we're going to have to think carefully about using LLMs in potentially adversarial situations...like, say, on the Internet.
Schneier urges engineers to balance the risks of generative AI with the powers it brings. "Using them for everything is easier than taking the time to figure out what sort of specialized AI is optimized for the task.
"But generative AI comes with a lot of security baggage — in the form of prompt-injection attacks and other security risks. We need to take a more nuanced view of AI systems, their uses, their own particular risks, and their costs vs. benefits."
"Data and control used the same channel," Schneier writes in Communications of the ACM. "That is, the commands that told the phone switch what to do were sent along the same path as voices." Other forms of prompt injection involve the LLM receiving malicious instructions in its training data. Another example hides secret commands in Web pages. Any LLM application that processes emails or Web pages is vulnerable. Attackers can embed malicious commands in images and videos, so any system that processes those is vulnerable. Any LLM application that interacts with untrusted users — think of a chatbot embedded in a website — will be vulnerable to attack. It's hard to think of an LLM application that isn't vulnerable in some way.
Individual attacks are easy to prevent once discovered and publicized, but there are an infinite number of them and no way to block them as a class. The real problem here is the same one that plagued the pre-SS7 phone network: the commingling of data and commands. As long as the data — whether it be training data, text prompts, or other input into the LLM — is mixed up with the commands that tell the LLM what to do, the system will be vulnerable. But unlike the phone system, we can't separate an LLM's data from its commands. One of the enormously powerful features of an LLM is that the data affects the code. We want the system to modify its operation when it gets new training data. We want it to change the way it works based on the commands we give it. The fact that LLMs self-modify based on their input data is a feature, not a bug. And it's the very thing that enables prompt injection.
Like the old phone system, defenses are likely to be piecemeal. We're getting better at creating LLMs that are resistant to these attacks. We're building systems that clean up inputs, both by recognizing known prompt-injection attacks and training other LLMs to try to recognize what those attacks look like. (Although now you have to secure that other LLM from prompt-injection attacks.) In some cases, we can use access-control mechanisms and other Internet security systems to limit who can access the LLM and what the LLM can do. This will limit how much we can trust them. Can you ever trust an LLM email assistant if it can be tricked into doing something it shouldn't do? Can you ever trust a generative-AI traffic-detection video system if someone can hold up a carefully worded sign and convince it to not notice a particular license plate — and then forget that it ever saw the sign...?
Someday, some AI researcher will figure out how to separate the data and control paths. Until then, though, we're going to have to think carefully about using LLMs in potentially adversarial situations...like, say, on the Internet.
Schneier urges engineers to balance the risks of generative AI with the powers it brings. "Using them for everything is easier than taking the time to figure out what sort of specialized AI is optimized for the task.
"But generative AI comes with a lot of security baggage — in the form of prompt-injection attacks and other security risks. We need to take a more nuanced view of AI systems, their uses, their own particular risks, and their costs vs. benefits."
"can't separate an LLM's data from its commands" (Score:2, Insightful)
"we can't separate an LLM's data from its commands"
Seriously? Who thought having a single stream for commands and data was a good idea never mind in 2024 but ever FFS?
If I type stuff into a word processor I don't expect to suddenly go off and try and delete my filesystem if I enter "rm -rf /" similar so why would an LLM look for commands in its runtime input data? Am I missing something here?
Re:"can't separate an LLM's data from its commands (Score:4, Insightful)
Seriously? Who thought having a single stream for commands and data was a good idea never mind in 2024 but ever FFS?
errr ... bruce schneier stating the obvious for some reason? half of the point of llm's is parsing commands, good luck designing around that.
the real problem here is that beyond all the fanfare and excitement we don't truly understand how they work, and it seems we won't for a good while. spoiler: bruce schneier doesn't either.
Re: (Score:2)
But he has grokked the weakness of placing a LLM public facing. There's certainly plenty of evidence to back his claim. And not just cunning security bypasses. Also stupidity of output, that some will blindly follow.
And to flip that - If LLMs are so easily messed up then how could they be relied upon when not public facing. Then we don't get to see the source of the mess unless the private entity works it out and then fesses up.
Re: (Score:1)
"we can't separate an LLM's data from its commands"
Seriously? Who thought having a single stream for commands and data was a good idea
The same people who thought making the Presentation Layer a de facto extension of the Application Layer was smart engineering.... So now most platform vulnerabilities are heap/presentation layer/prompt attacks now.....
Re:"can't separate an LLM's data from its commands (Score:4, Insightful)
"we can't separate an LLM's data from its commands"
Seriously? Who thought having a single stream for commands and data was a good idea never mind in 2024 but ever FFS?
For LLMs it's all there is. Humans are no different. We don't have dedicated command streams either, we have to discern context and disambiguate and decide what to do from our understanding of the environment. The massive success of phishing attacks owning millions is a testament to the imperfect nature of flawed systems meet flawed human operators.
If I type stuff into a word processor I don't expect to suddenly go off and try and delete my filesystem if I enter "rm -rf /" similar so why would an LLM look for commands in its runtime input data? Am I missing something here?
LLMs are kind of like people in a way that they are not very good at discerning complicated or confusing context from a wall of text and often times they are far worse.
If you had a bunch of quotes of people quoting other people quoting things a person would quickly become confused and lose track of who is quoting who. LLMs are at least as bad.
With computers we have fancy UIs and can always see application windows and most of us can discern context between text on an arbitrary website and our boss calling us up to give us more tasks. Imagine if instead of all that you only had a green screen with a stream of text. While reading the website you lost track of where you are because the site contains data that looks like the website ended a while ago and your boss telling you what to do started. A person could easily be confused by that, their attention may have slipped not noticing the special control signal that is supposed to alert them to context changes.
LLMs are no different and often much worse. The best you can do is a solution to make context far more obvious to the LLM with the equivalent of that fancy UI that always makes it unambiguously and persistently clear which window is the website and which is the boss on the telephone. But still if injection is an issue people are probably misusing LLMs for something they shouldn't be relying on them for in the first place.
Re: "can't separate an LLM's data from its command (Score:1)
LLMs are just software, not magic. There can be one channel for user input and another for system commands.
Re: (Score:2)
LLMs are just software, not magic. There can be one channel for user input and another for system commands.
Well hey look, problem solved! All the current engineers couldn't do it but Viol8 apparently has it licked! Can't wait to see who hires you to implement it.
Re: (Score:2)
Re: (Score:2)
Yeah, that must be one of the reasons why vim is the leading word processor worldwide.
Re: (Score:2)
Re: (Score:1)
Obviously this was meant to be sarcasm but there's a real cause to call their competence into question over this. To be honest I've been looking for any proof in the news that these jackasses even invented this themselves rather than stealing or buying it in binary-blob-only format under the table from Russians or something like that, and there's none.
Re: (Score:1)
Re: (Score:2)
Oh? And how do you deal with interaction created state? Because for an LLM, that is "system commands", and it is needed to make them a bit more useful. A purely static LLM is so utterly dumb, it cannot do anything. In particular, it is impossible to customize for a regular user.
Re: (Score:2)
LLMs are kind of like people in a way that they are not very good at discerning complicated or confusing context from a wall of text and often times they are far worse.
A friend of mine said the same thing... If LLMs are a reflection of how the human brain works, then they're susceptible to the same attacks that we see nowadays in society: allow it to learn the wrong things, and it will spill out the wrong answers. Train it on conspiracy theories and it will spew out conspiracy theories.
The attack surface is humongous, and it will only get worse as training happens iteratively on fake news as the years progress. Next year there will be a lot more content created by LLMs, a
Re: (Score:2)
Yes, you are missing something, human conversation (which "LLM" are supposed to emulate) does precisely that, it mixes control with data and leaves it to the free agents on both sides of the conversation to decide for themselves how to implement the incoming stream.
So,
a) it is inevitable due to the nature of the thing LLM are emulating :) and
b) how harmful it gets depends on the "intelligence" of the agent
You already see how it is going to work out, don't you?
Re: (Score:1)
This is not really about "censorship", it's much bigger than that. This is about things like putting secret instructions in what will be used for training, or, like in summary, "hold up a carefully worded sign and convince it to not notice a particular license plate — and then forget that it ever saw the sign".
Re: (Score:1)
It's about data security. If you value your data, make sure it doesn't end up in anyone's LLM, because it's liable to leak out.
To the grandfather poster, who wanted to "expand his mind" by talking to a chatbot, then was stunned when the computer lectured him - Whatever you're trying to do, you're doing it ass-backwards.
Re: (Score:2)
Actually, it is a lot worse. As soon as any automation or exploitable human-driven activity is tied to an LLM that is publicly available and does not do exceptionally careful per-user isolation, there is a very big, very fat bundle of attack vectors.
Re: (Score:1)
Re: (Score:2)
I have bad news for you - you're surrounded by idiots who'd do this any day of the week, any hour of the day. If only they knew how :)
Re: (Score:2)
Little Bobby Tables (who is all grown up now) is going to hold the world to ransom, and we won't even realise it.
Re: (Score:2)
Yep. I wonder whether it will take a few years to happen or whether the stupid will make it happen much sooner.
Disable learning when deployed (Score:2)
Re: (Score:2)
Yeah, there is no reason why your phone keyboard should learn a local vocabulary, indeed.
Re: (Score:2)
There is no reason why an AI must stay in learning mode when deployed.
Actually, it doesn't. That part of the TFS is clearly wrong. The reason why it seems that "LLM modifies itself" is that it actually uses all the prompted data in formatting the reply. And when using it in "chat mode", one needs to prompt it with all of the relevant chat history with EVERY REQUEST - otherwise it will reply as it would from the "blank" (pre-trained) state.
Sounds like people (Score:2)
The same data-channels - my 5 (or more?) senses - tell me if my food smells like Grandma's famous apple pie (stored as information) or is putrid (arguably an "instinctive command" to avoid eating the item, at least according to some theories).
Sometimes that causes problems, like missing out on some supposedly-good-tasting but horrible-smelling cheeses that I haven't had the courage to try.
By the way, my example is open to quibbling, but the general principle stands: There are times when people can be foole
Re: (Score:2)
Indeed, it does. And then look at how easy most people are to scam or to manipulate. Incidentally, the input generally does not need to be that well-crafted for most people. Just look at politics and observe what Big Lies people are ready to eat up. Effective propaganda is always designed and targeted at those with the weakest minds. Written down and implemented a while back by a guy called Hitler. That approach still works nicely today. No, I have not read "Mein Kampf". But I think maybe I should. The com
Someday ... (Score:2)
Nice! Very, very nice! (Score:2)
Regular software vulnerabilities are kid stuff compare to the attack surface present here. We do not even really understand the models. For conventional attacks, regular program semantics is used or a small number of mostly understood side-channels. These may get used or combined in surprising ways, but at least we understand the base mechanisms pretty well. With LLMs? We have no clue what is possible.
I like the analogy to classical telefone signalling. It is not perfect, but the best one I have seen so far
Re: (Score:2)
With LLMs? We have no clue what is possible.
That's simply not true. The limits are very well understood, contrary to what the marketers would have you believe.
As for the attack surface, that should be clear as crystal for any given implementation. These things aren't magic. I don't know who Schneier thinks he's talking to, but if you're trying to impose strictures by fine-tuning an LLM, you don't need a warning, you need to be fired for gross incompetence.
Re: (Score:2)
Hahaha, no. The _limits_ of what intentional things an LLM can do are reasonably well understood. The limits of what an LLM can be tricked into doing are somewhat understood on the theoretical side, but on the practical ones they depend very strongly on the training data and on implementation details, like temporary state or persistent state depending on user and the interactions with it. Hence the concrete possibilities are _not_ understood at all for any of the LLMs currently deployed.
You are thinking lik
Re: (Score:2)
You're waving your hands an awful lot... As for the security issues, you've dramatically over-complicated the problem.
Your mistake in reasoning, as far as security is concerned, is that you're assuming that the LLM can do more than would otherwise be possible given access to the same interfaces. Sticking an LLM between the user and those interfaces doesn't magically increase the attack surface. If anything, it narrows it as the range of possible inputs is unlikely to completely overlap with the range of
)ld sayings.... (Score:2)
Schneier has apparently never heard of the VERY old saw, "Don't try to teach your grandmother to suck eggs." Yeah, now it would be great grandmother or great great grandmother who last "sucked eggs"; but, the idea stands, "Don't try to teach an expert what he or she already surely knows."
{^_^}