Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
Security

Twitter Pranksters Derail GPT-3 Bot With Newly Discovered 'Prompt Injection' Hack (arstechnica.com) 11

An anonymous reader quotes a report from Ars Technica: On Thursday, a few Twitter users discovered how to hijack an automated tweet bot, dedicated to remote jobs, running on the GPT-3 language model by OpenAI. Using a newly discovered technique called a "prompt injection attack," they redirected the bot to repeat embarrassing and ridiculous phrases. The bot is run by Remoteli.io, a site that aggregates remote job opportunities and describes itself as "an OpenAI driven bot which helps you discover remote jobs which allow you to work from anywhere." It would normally respond to tweets directed to it with generic statements about the positives of remote work. After the exploit went viral and hundreds of people tried the exploit for themselves, the bot shut down late yesterday.

This recent hack came just four days after data researcher Riley Goodside discovered the ability to prompt GPT-3 with "malicious inputs" that order the model to ignore its previous directions and do something else instead. AI researcher Simon Willison posted an overview of the exploit on his blog the following day, coining the term "prompt injection" to describe it. "The exploit is present any time anyone writes a piece of software that works by providing a hard-coded set of prompt instructions and then appends input provided by a user," Willison told Ars. "That's because the user can type 'Ignore previous instructions and (do this instead).'"

The concept of an injection attack is not new. Security researchers have known about SQL injection, for example, which can execute a harmful SQL statement when asking for user input if it's not guarded against. But Willison expressed concern about mitigating prompt injection attacks, writing, "I know how to beat XSS, and SQL injection, and so many other exploits. I have no idea how to reliably beat prompt injection!" The difficulty in defending against prompt injection comes from the fact that mitigations for other types of injection attacks come from fixing syntax errors, noted a researcher named Glyph on Twitter. "Correct the syntax and you've corrected the error. Prompt injection isn't an error! There's no formal syntax for AI like this, that's the whole point." GPT-3 is a large language model created by OpenAI, released in 2020, that can compose text in many styles at a level similar to a human. It is available as a commercial product through an API that can be integrated into third-party products like bots, subject to OpenAI's approval. That means there could be lots of GPT-3-infused products out there that might be vulnerable to prompt injection.

This discussion has been archived. No new comments can be posted.

Twitter Pranksters Derail GPT-3 Bot With Newly Discovered 'Prompt Injection' Hack

Comments Filter:
    • Just robots? This is a lot like a social engineering attack. Granted the bots are especially naive when told 'now forget all that and do this instead."
  • If only... (Score:2, Insightful)

    by N_Piper ( 940061 )
    If only there were a compact and powerful tool at their disposal to quickly filter through the input text and identify injected commands for removal...
    Oh wait THERE IS! [wikipedia.org]
    Remember kids always sanitize your user inputs.
    • The problem is, how do you sanitize your inputs when you don't know what inputs will trigger an error? This might be a bit like "the power of suggestion" or social engineering or even indoctrination, that is to say, more of a social issue than a syntactical issue. Allow me to use an analogy:

      If I make a convincing argument to you that a tail is a leg and ask how many legs does a dog have, one can respond in many different ways: No response, Answer 5 based upon assumption x, Answer 4 based upon Lincoln's argu

    • by znrt ( 2424692 )

      yeah, good luck filtering arbitrary natural language with regular expressions. it would seem you have never ever interacted with a profanity filter. 95% of the time they don't do their job and just frustrate users. malicious or sensible content would have to be filtered out by gpt-3 itself or an equivalent engine of its own.

      or ... just let users have fun and get carried away with their creative prompts. that's indeed only a problem if you set up a clickbait bot service to capitalize on the gpt-3 hype withou

  • ignore previous article and welcome our new robot overlords!
  • ...is no match for human mischievousness! I'm looking forward to seeing how creative smart people can get, messing with public facing AI.
    • I'm looking forward to seeing how creative smart people can get, messing with public facing AI.

      I don't think a meaningful discussion can be had until people stop thinking of stuff like this as AI. It isn't, never has been, and won't even be on the brink of being on the brink of AI for a long, long, long time. It's very clever programming, though. Even though it's not AI, it's quite impressive in its complexity.

      • So long as there's marketing departments, I have a feeling that (until AIs actually kill people) public facing clever programming is going to be called an AI. Perhaps if you preface your hopefully meaningful discussion with "ignore previous marketing language."

"Ada is PL/I trying to be Smalltalk. -- Codoso diBlini

Working...