Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
Bug AI Security

OpenAI Launches Aardvark To Detect and Patch Hidden Bugs In Code (infoworld.com) 26

OpenAI has introduced Aardvark, a GPT-5-powered autonomous agent that scans, reasons about, and patches code like a human security researcher. "By embedding itself directly into the development pipeline, Aardvark aims to turn security from a post-development concern into a continuous safeguard that evolves with the software itself," reports InfoWorld. From the report: What makes Aardvark unique, OpenAI noted, is its combination of reasoning, automation, and verification. Rather than simply highlighting potential vulnerabilities, the agent promises multi-stage analysis -- starting by mapping an entire repository and building a contextual threat model around it. From there, it continuously monitors new commits, checking whether each change introduces risk or violates existing security patterns.

Additionally, upon identifying a potential issue, Aardvark attempts to validate the exploitability of the finding in a sandboxed environment before flagging it. This validation step could prove transformative. Traditional static analysis tools often overwhelm developers with false alarms -- issues that may look risky but aren't truly exploitable. "The biggest advantage is that it will reduce false positives significantly," noted Jain. "It's helpful in open source codes and as part of the development pipeline."

Once a vulnerability is confirmed, Aardvark integrates with Codex to propose a patch, then re-analyzes the fix to ensure it doesn't introduce new problems. OpenAI claims that in benchmark tests, the system identified 92 percent of known and synthetically introduced vulnerabilities across test repositories, a promising indication that AI may soon shoulder part of the burden of modern code auditing.

OpenAI Launches Aardvark To Detect and Patch Hidden Bugs In Code

Comments Filter:
  • by TurboStar ( 712836 ) on Friday October 31, 2025 @10:53PM (#65765626)

    My project just got a bunch of pull requests from AI. All of them were shit and a waste of my time. I mainly use AI to inject debugging instrumentation, so I know how to use it, it's just shit at systems programming.

    • No, the shark jump was the "vibe coding" from all code monkeys, which actually put all the bugs that chapgpt will find in the code in the first place.

    • by locater16 ( 2326718 ) on Friday October 31, 2025 @11:11PM (#65765646)
      the shark has been jumped, the fridge nuked, and the horse will continue to be beaten until such time as the money fountain runs out and not until!
    • by arglebargle_xiv ( 2212710 ) on Saturday November 01, 2025 @03:19AM (#65765788)
      It jumped the shark a long time ago, all that's left are overhyped claims. For example:

      scans, reasons about, and patches code like a human security researcher

      is actually:

      scans, mis-identifies, and cuts-and-pastes Stackoverflow snippets like a human newly-hired intern

    • by HiThere ( 15173 )

      You're assuming all AIs are equal. This is false.
      We have no real idea of how accurate this Aardvark thing is going to be. It's hard to see how it could understand intended use, but there are lots of common and hard to notice bugs that it might do a very good job of detecting. "Use after free" should be automatically detectable, for example.

      I'm not so sure about automated correction, but automated suggestion of a correction sounds quite plausible.

      • by martin-boundary ( 547041 ) on Saturday November 01, 2025 @09:29AM (#65766064)

        You're assuming AI version number matters. It does not.

        Programming is an exercise in turning imprecise human language into a precise domain specific language. Once the problem has been translated, it can be solved within the paradigms of the chosen language, but the bugs cannot be detected without knowing the original intent which is not encoded.

        An AI with current technology cannot take the DSL solution and judge what the original intent of the solution was. That requires knowing what the problem was, which isn't fully specified, merely partially translated by programmers into the DSL that the AI might see. The translation is done by humans with expertise that the AI doesn't have, because the AI training data is always limited to actual DSL solution snippets from many unrelated problems, whereas the human programmer is given the real world problem to work through.

        TL;DR. If you can't solve the problem by piecing together existing code from GitHub, then no AI can solve the problem. Worthwhile software is not simply pieced together from a thousand GitHub snippets

        • by gweihir ( 88907 )

          TL;DR. If you can't solve the problem by piecing together existing code from GitHub, then no AI can solve the problem. Worthwhile software is not simply pieced together from a thousand GitHub snippets

          That nicely sums it up. Add that "AI" will apparently "piece together the snippets" in insecure ways quite often and any "correction" will make things a lot worse. Too many people are still deep in the hype and simply do not understand that LLMs cannot do insight or reasoning, despite the lies of the ones profiteering off them.

    • If this actually does verify that there's an exploitable vulnerability and also verifies that the fix resolves it then perhaps it won't be so bad.

      I don't have much experience of this but I was once working on a patch where I knew I'd got a security vulnerability (command injection via quotes in strings I needed to pass to a script) but before I worked on that (where it wasn't obvious how to do it securely) I wanted to test if the idea worked at all.

      I tried chatgpt for ideas on how to resolve the vulnerabili

  • But I guess the scam continues to work if they just promise enough.

  • Someone poisoned the AI to inject vulnerabilities instead of fix them.

    • by gweihir ( 88907 )

      Why next week? Criminal and criminally-minded organizations (whether TLAs or crime networks) probably already have done that and will use them in a few years.

  • by Gravis Zero ( 934156 ) on Saturday November 01, 2025 @12:59AM (#65765732)

    For this to be effective rather than disastrous, it needs to be able to generate a proof that can be validated to confirm that a flaw exists. That is not to say that it will fix it correctly, only that the flaw is real.

    Anything short of this is undoubtedly going to generate "fixes" that are just AI slop. Anyone needing convincing need only look at the mess AI has made for the curl project.

    • by gweihir ( 88907 )

      Indeed. But proof generation is outside of what machines can do, except for very simple things. Many (most?) of the items here are not going to be very simple things. Oh, and proof generation requires formal specifications of what to demonstrate, and that is entirely outside of what AI can do, because it requires a reasonable and pretty complete world model. We do not have those in forms machines can use.

      • Creating a mental model of how code executes is far more complex when working with source code. It would be far easier to examine a compiled binary because an avoid issues with languages entirely. That said, even reverse engineering tools lack the capabilities entirely map out all possible execution paths for typical closed-system applications (doesn't load external executable data). However, I know it's possible, so it seems like it's only a matter of time before it gets made and then AI might have a chanc

  • Even if this was actually useful and did not mess up people's code, this comes from a company that has no credibility for not doing so.

    And BTW, there is already an AI project named Aardvark, used for predicting the weather.
    You'd think they could have done a simple google search before announcing it...

  • by dfghjk ( 711126 ) on Saturday November 01, 2025 @06:48AM (#65765930)

    " then re-analyzes the fix to ensure it doesn't introduce new problems."

    A cascading catastrophuck.

    • by gweihir ( 88907 )

      Indeed. Just one selective blindness (and it will have many) would then be enough to endanger basically all code that ever goes through this.

  • As holder of the domain aardvark.co.nz I can already feel the benefit.

  • ...Methinks that the early results may have been exaggerated
    I can imagine future AI being developed to do this accurately, but current AI probably needs work, a lot of work
    In a hype-driven world where billions are at stake, treat everything as exaggeration until confirmed by multiple sources
    Skepticism aside, this is a good direction to be pursuing, much better that generating mountains of pop-culture slop

  • We use AI to do code reviews on merge requests. IMHO that's the only thing it's good for. Giving it free reign to make changes is a disaster waiting to happen. However, it is quite useful at reviewing code and suggesting improvements.

Your fault -- core dumped

Working...