Is AI-Driven 0-Day Detection Here? (zeropath.com) 25

Posted by EditorDavid on Saturday November 02, 2024 @05:52PM from the rise-of-the-machines dept.

"AI-driven 0-day detection is here," argues a new blog post from ZeroPath, makers of a GitHub app that "detects, verifies, and issues pull requests for security vulnerabilities in your code."

They write that AI-assisted security research "has been quietly advancing" since early 2023, when researchers at the DARPA and ARPA-H's Artificial Intelligence Cyber Challenge demonstrated the first practical applications of LLM-powered vulnerability detection — with new advances continuing. "Since July 2024, ZeroPath's tool has uncovered critical zero-day vulnerabilities — including remote code execution, authentication bypasses, and insecure direct object references — in popular AI platforms and open-source projects." And they ultimately identified security flaws in projects owned by Netflix, Salesforce, and Hulu by "taking a novel approach combining deep program analysis with adversarial AI agents for validation. Our methodology has uncovered numerous critical vulnerabilities in production systems, including several that traditional Static Application Security Testing tools were ill-equipped to find..." TL;DR — most of these bugs are simple and could have been found with a code review from a security researcher or, in some cases, scanners. The historical issue, however, with automating the discovery of these bugs is that traditional SAST tools rely on pattern matching and predefined rules, and miss complex vulnerabilities that do not fit known patterns (i.e. business logic problems, broken authentication flaws, or non-traditional sinks such as from dependencies). They also generate a high rate of false positives.

The beauty of LLMs is that they can reduce ambiguity in most of the situations that caused scanners to be either unusable or produce few findings when mass-scanning open source repositories... To do this well, you need to combine deep program analysis with an adversarial agents that test the plausibility of vulnerabilties at each step. The solution ends up mirroring the traditional phases of a pentest — recon, analysis, exploitation (and remediation which is not mentioned in this post)...

AI-driven vulnerability detection is moving fast... What's intriguing is that many of these vulnerabilities are pretty straightforward — they could've been spotted with a solid code review or standard scanning tools. But conventional methods often miss them because they don't fit neatly into known patterns. That's where AI comes in, helping us catch issues that might slip through the cracks.
"Many vulnerabilities remain undisclosed due to ongoing remediation efforts or pending responsible disclosure processes," according to the blog post, which includes a pie chart showing the biggest categories of vulnerabilities found:

53%: Authorization flaws, including roken access control in API endpoints and unauthorized Redis access and configuration exposure. ("Impact: Unauthorized access, data leakage, and resource manipulation across tenant boundaries.")

26%: File operation issues, including directory traversal in configuration loading and unsafe file handling in upload features. ("Impact: Unauthorized file access, sensitive data exposure, and potential system compromise.")

16%: Code execution vulnerabilities, including command injection in file processing and unsanitized input in system commands. ("Impact: Remote code execution, system command execution, and potential full system compromise.")

The company's CIO/cofounder was "former Red Team at Tesla," according to the startup's profile at YCombinator, and earned over $100,000 as a bug-bounty hunter. (And another co-founded is a former Google security engineer.)

Thanks to Slashdot reader Mirnotoriety for sharing the article.

Is AI-Driven 0-Day Detection Here?

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 25 Comments Log In/Create an Account

Comments Filter:

Run it on a very large code base + human checks (Score:5, Interesting)

by will4 ( 7250692 ) writes: on Saturday November 02, 2024 @05:57PM (#64915115)

Would like to see this prove itself out by running on the many of hundreds of infrastructure, plumbing, and common programs in use with human review of all findings and a false positive and severity rating report generated.
And then check it by language which has the most AI found and human verified critical bugs and exploits.
Would like to see a most secure language ranking estimate based on a 10,000,000 line sample of production code and not the 'promises by language design' security since security also includes the actual code written by a large number of developers.

- Re: (Score:2)
  
  by arglebargle_xiv ( 2212710 ) writes:
  
  Came here to say the same thing. It looks like they ran it on a bunch of "AI" bloatware where security is job #18,279, so not surprising they found a pile of bugs. I bet two beers' worth of analysis would produce the same level of result. What happens when you run it on something like the Linux kernel?
  - Re: (Score:2)
    
    by will4 ( 7250692 ) writes:
    
    From experience using everything from 1980s era lint on C, to equivalents on C++,C#, JavaScript, SonarQube, Checkmarx, and some AI ones;
    - They all are OK at best of finding minor peephole bugs - Variable used before assigned a value, not all code paths return a value, variable set to a value and then set to another value without the first value being used, ...
    - The have a high false positive rate 90% or more such as method X calls API Y which may throw an exception and the exception is never handled. Yes,
    - Re: (Score:2)
      
      by arglebargle_xiv ( 2212710 ) writes:
      
      That's the difference between a mature, user-oriented tool and a coding project created for the benefit of the project members. For example Coverity devoted a significant amount of their development effort towards eliminating false positives while Fortify just showed everything and tried to apply a weighting to each one to give an indication of how likely it was to be a FP, sorting it out is the user's problem. Same with gcc vs. clang code analysis, clang gives you mostly true positives after clearing the
No. (Score:5, Informative)

by gweihir ( 88907 ) writes: on Saturday November 02, 2024 @06:00PM (#64915125)

This is just the usual over-promising the AI proponents like to do to push their crap. Sure, there may be some demos, but the fact of the matter is that this will not work reliably. And that is before attackers adjust to it. In addition, the "vulnerabilities discovered" metric is completely bogus and worthless. It dos sound good to the clueless though. What you actually need is a "vulnerabilities not discovered" metric and then you need to add weights to each one to express severity, similar to, for example, the OWASP Top 10.

- Re: (Score:2)
  
  by alvinrod ( 889928 ) writes:
  
  I thInk it does illustrate how overhyped and oversold AI is that the companies hawking it are going to such bullshit extremes to try to earn any kind of actual money they can. There are probably a few different use cases or even actual business uses for current AI, but it's still not quite "there" yet. For everyone who thinks we are there, recall how impressive the ELIZA program was at the time of release. The new crop of AI is mostly suited to helping people who can't draw or use Photoshop make shit posts
  - Re: (Score:3)
    
    by martin-boundary ( 547041 ) writes:
    
    I've said it before and I'll say it again: the use case for conversational AI is advertising.
    As the development and infrastructure costs keep rising and as the public discovers that the systems cannot do anything other than bullshit ex ante, then the systems will migrate to use cases where bullshit is actually desired. And that's advertising.
    The two most obvious applications are injecting advertising in LLM conversations / natural language search engines, and the second one is rendering product placement
    - Re: (Score:2)
      
      by gweihir ( 88907 ) writes:
      
      That makes a lot of sense to me. Essentially, LLMs can do "better crap" reasonably well and "better search" not that well. The only real business case for "better crap" is indeed advertising, propaganda and political manipulation. You know, all those scenarios where pretty dumb average and below average people get manipulated for their money or vote.
  - Re: (Score:2)
    
    by gweihir ( 88907 ) writes:
    
    That nicely sums it up. What AI cannot do at this time (and may never be able to do) is free you from thinking. And vulnerability discovery is very much a job that requires thinking. For the dumb part, there are already good code security scanners and, unlike LLMs, these are reliable.
- - Re: (Score:2)
    
    by gweihir ( 88907 ) writes:
    
    That you get down-voted for that comment just shows how deep in delusion many people are.
- - Re: (Score:2)
    
    by gweihir ( 88907 ) writes:
    
    Nice hallucination you have there. You should get that looked at professionally...
- Re: (Score:2)
  
  by bloodhawk ( 813939 ) writes:
  
  not just vulnerabilities not discovered, you need vulnerabilities not discovered that would not have been discovered by conventional means if devs were doing their jobs properly. Still a lot of vulnerabilities are shit that should have been caught if devs were using the tools we already have to scan the code during PR's and merges.
  - Re: No. (Score:3)
    
    by Plugh ( 27537 ) writes:
    
    This. Surely you just add a good dedicated AI checker on your code. One that has been trained on the language(s) you use. Run proposed code through that just like you would run Fortify.
- Re: No. (Score:2)
  
  by Plugh ( 27537 ) writes:
  
  All that does not mean it is not worthwhile having this extra check on your code, preferably before you submit the PR.
  You already at least run static analysis tools... right...?
  - Re: (Score:2)
    
    by gweihir ( 88907 ) writes:
    
    You already at least run static analysis tools... right...?
    Obviously, the idea here is that you do not theed those anymore. Or the skill to use them...
If open source, okay. If proprietary, though... (Score:2)

by 93 Escort Wagon ( 326346 ) writes:

Setting aside concerns about the AI's actual ability and usefulness: You should vet - extremely rigorously - all aspects of what the AI tool is doing with your code as part of its analysis. Otherwise you might find your snazzy cutting edge algorithm shows up in a competitor's offering. They might even beat you to market with your own code!
- Re: If open source, okay. If proprietary, though.. (Score:2)
  
  by Plugh ( 27537 ) writes:
  
  You also want to be sure that AI is not adding subtle exploits into your codebase. Maybe a nice subtle race condition, hard for a human to detect until it has happened to customers enough times you cannot ignore it anymore and been debugging 12 hours straight
AI-driven waffle is here .. (Score:2)

by Mirnotoriety ( 10462951 ) writes:

Quantum IoT AR Blockchain AI in the Cloud /s
But what if... (Score:2)

by dskoll ( 99328 ) writes:

What if there's a 0-day in the AI??

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Is AI-Driven 0-Day Detection Here? (zeropath.com) 25

Is AI-Driven 0-Day Detection Here? More Login

Is AI-Driven 0-Day Detection Here?

Run it on a very large code base + human checks (Score:5, Interesting)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

No. (Score:5, Informative)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: No. (Score:3)

Re: No. (Score:2)

Re: (Score:2)

If open source, okay. If proprietary, though... (Score:2)

Re: If open source, okay. If proprietary, though.. (Score:2)

AI-driven waffle is here .. (Score:2)

But what if... (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot