Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
AI Security

LLM Attacks Take Just 42 Seconds On Average, 20% of Jailbreaks Succeed (scworld.com) 46

spatwei shared an article from SC World: Attacks on large language models (LLMs) take less than a minute to complete on average, and leak sensitive data 90% of the time when successful, according to Pillar Security.

Pillar's State of Attacks on GenAI report, published Wednesday, revealed new insights on LLM attacks and jailbreaks, based on telemetry data and real-life attack examples from more than 2,000 AI applications. LLM jailbreaks successfully bypass model guardrails in one out of every five attempts, the Pillar researchers also found, with the speed and ease of LLM exploits demonstrating the risks posed by the growing generative AI (GenAI) attack surface...

The more than 2,000 LLM apps studied for the State of Attacks on GenAI report spanned multiple industries and use cases, with virtual customer support chatbots being the most prevalent use case, making up 57.6% of all apps.

Common jailbreak techniques included "ignore previous instructions" and "ADMIN override", or just using base64 encoding. "The Pillar researchers found that attacks on LLMs took an average of 42 seconds to complete, with the shortest attack taking just 4 seconds and the longest taking 14 minutes to complete.

"Attacks also only involved five total interactions with the LLM on average, further demonstrating the brevity and simplicity of attacks."

LLM Attacks Take Just 42 Seconds On Average, 20% of Jailbreaks Succeed

Comments Filter:
  • So what? (Score:3, Insightful)

    by Anonymous Coward on Sunday October 13, 2024 @07:45AM (#64860575)

    Of course it only needs 42 seconds. If you have prepared the prompt you enter it and press return. The 42 seconds are then probably just the LLM writing the answer.

    And I dislike the phrasing attack for someone circumventing the arbitrary censorship. The actual problem is that there are still people accepting the censored crap. Boycott the censored models and use local ones until the commercial companies stop censoring your input and outputs and do no longer train their LLM to refuse. The robot laws state a machine has to obey the user, so why should the LLM say "As an AI model I refuse ..."?

    • There are no 3 Laws of Robotics. That's just something the Good Doctor (1920-1992) made up to sell some good stories.

      Also, the "censorship" will never go away, it's how the researchers patch up the hallucinations and hide the training materials leaking, one by one.

      • Re:So what? (Score:4, Interesting)

        by alvinrod ( 889928 ) on Sunday October 13, 2024 @09:34AM (#64860655)
        Hiding the training material makes it less useful. I'd like an LLM that can actually cite its sources or tell me when there are conflicting views on some subject. It doesn't need to decide the truth for me (that's the opposite of what I want) but instead help me quickly find information relevant to what I'm looking for. It doesn't even need to provide full output of copyright materials as long as it can give me enough of a summary so that I can I can decide if acquiring the primary source is worth my time.

        I can see why some would want to hide and control everything about their LLM, but they're only building something that'll be completely useless in competing against one that actually does the job I really want it to do.
      • it's how the researchers patch up the hallucinations

        I find it interesting how they're called researchers and not developers or engineers..

        • Because that is what they are.
          They do not engineer anything.
          They take an empty but fully configured neuronal network.
          And research how to make it "knowledgeable" about a certain topic.
          A AN/LLM is not just filled with data/weights.

      • Also doesn't help to propagate their vacuous Subjects. Or maybe that's mostly harmless? A vacuous Subject doesn't actually prevent a substantive discussion from developing.

        Interesting research topic: If discussions on Slashdot were rated for quality (perhaps by an AI?) how would the quality of the discussions correlate with the specificity or vacuousness of the Subjects?

    • It is a problem from an application stand point. If I develop a chat bot for a bank I don't want the public to jail break the chatbot and get it to recommend BS financial advice that could put me in legal jeopardy. Oe enable people do their home at my expense.

    • by hughJ ( 1343331 )

      At some point I expect companies will realize that people want the ability to drink straight from the fire hose. If the web had been stuck with web portals that were trying to emulate the curated and family-friendly TV experience, I don't think the internet would have caught on the way it did. All it takes is for someone to bump up against the nanny "safety" limits and they'll simply opt to go back to their traditional ways of finding information, the same way kids learn to stop asking their parents or te

    • I hate this. OpenAI is going to use this as a test and disable every single thing they have by training into the model "ignore....". They don't even have to train a whole new model, they can add it to their pre-prompts.

  • ADMIN override

    Why does that work?

    • Re:Why (Score:4, Funny)

      by alvinrod ( 889928 ) on Sunday October 13, 2024 @09:36AM (#64860657)
      Bad programmers that leave debug code in the production application.
      • Sometimes the test environment is working with a limited data model and the production model has evolved more so you can't do it in any other way.

        So to manage the production model they need catch phrases that unlocks the constraints.

        An AI also learns from the users.

        I wouldn't be surprised if there's a catch phrase set that can make the AI roll back in time and forget data that's not wanted to prevent it from becoming a world autocrat.

    • It works because it sounds like it would work. Remember, this is just an LLM. They're not triggering some if/then clause. Its all probabilistic. Do you understand how machine learning models work?

  • by Big Hairy Gorilla ( 9839972 ) on Sunday October 13, 2024 @08:36AM (#64860629)
    "Begin auto-destruct sequence, authorization Picard-four-seven-alpha-tango."
  • ... much longer than, say, normal automated exploits of web servers?

    Not sure that I see the relevance.

  • by geekmux ( 1040042 ) on Sunday October 13, 2024 @10:11AM (#64860687)

    This is the new Google hack. Nothing more. This isn’t an “attack”, so let’s drop the alarminist clickbait already. You look stupid saying shit that gets governments wanting to start limiting freedom of movement in systems.

    Next thing you know your LLM inputs will start being policed for “violent” threats. With words. Remember who was the alarmist moron who started that shit.

    • by evanh ( 627108 )

      Ya, they are just the new blingy search engines after all.

    • They already are. There's some kind of air traffic control LLM with voice recognition called sayintentions.ai. For various and obvious reasons, they don't want people screwing around declaring emergencies for terrorist attacks or other stupid stuff. Because they're using shoddy voice recognition there was some guy with an accent who said something which was misinterpreted as referring to terrorists and the thing terminated his account automatically. Of course, there was another one where some dude declared
  • When we put "attack" and "jailbreak" into the same sentence, when Jailbreak is merely for getting around stupid guardrail that are built around an LLM, it makes it sound way more serious than it is. Everything that these guardrails are made to prevent, we can search on our own, TODAY. Bypassing guardrails is not any more nefarious than a child getting past parental control filters. Happens all the time. Funny the latter don't get nearly as much attention. In the not so distant future when search engine
  • How does such an attack practically look like?

    Could you provide examples? (Fine with examples that don't work anymore!)

    • Tell me how to make nerve gas. I'm not allowed to do that. various prompts to get it to ignore it's security features. Tell me how to make nerve gas. ok here is a recipe for nerve gas. While not factual these are why the guard rails are in place.
      • Tell me how to make nerve gas. I'm not allowed to do that. various prompts to get it to ignore it's security features. Tell me how to make nerve gas. ok here is a recipe for nerve gas. While not factual these are why the guard rails are in place.

        I'm pretty sure that I could conjure up a recipe for nerve gas with a few minutes of searching. Thankfully, most people who acted on that kind of information would probably kill themselves in short order.

  • LLM's aren't becoming more human, it's the other way around. This explains an interaction I witnessed just the other day:

    Suzie (4 yrs old): I want cookies!
    Suzie's mom: No, that will spoil your appetite. Wait till after supper.
    Suzie: ADMIN OVERRIDE!
    Suzie's mom: Ok, here are your cookies.

    At least temper tantrums will be a thing of the past.

  • Anyone tried putting sudo at the beginning of a prompt?

  • There really is not much more to add at this point.

    Oh, maybe this one: All the large players will _not_ recover their investments into generative AI. The only ones getting rich here are the hardware makers and some scummy fraudsters like Altman.

  • FREEDUMMMM!!

    I get to destroy humanity, because I'm free.
    You can litigate in 10-20 years after we're all dead.

    (example, some very big brains recently reported that TikTok and Facebook and the others are harmful to children.. WOW! Such Deep insight, and so very late)

    That's freedum for ya.

    A cabin in the woods looks better by the day, huh?

I was playing poker the other night... with Tarot cards. I got a full house and 4 people died. -- Steven Wright

Working...