OpenAI Launches Aardvark To Detect and Patch Hidden Bugs In Code (infoworld.com) 26

Posted by BeauHD on Friday October 31, 2025 @09:10PM from the search-and-destroy dept.

OpenAI has introduced Aardvark, a GPT-5-powered autonomous agent that scans, reasons about, and patches code like a human security researcher. "By embedding itself directly into the development pipeline, Aardvark aims to turn security from a post-development concern into a continuous safeguard that evolves with the software itself," reports InfoWorld. From the report: What makes Aardvark unique, OpenAI noted, is its combination of reasoning, automation, and verification. Rather than simply highlighting potential vulnerabilities, the agent promises multi-stage analysis -- starting by mapping an entire repository and building a contextual threat model around it. From there, it continuously monitors new commits, checking whether each change introduces risk or violates existing security patterns.

Additionally, upon identifying a potential issue, Aardvark attempts to validate the exploitability of the finding in a sandboxed environment before flagging it. This validation step could prove transformative. Traditional static analysis tools often overwhelm developers with false alarms -- issues that may look risky but aren't truly exploitable. "The biggest advantage is that it will reduce false positives significantly," noted Jain. "It's helpful in open source codes and as part of the development pipeline."

Once a vulnerability is confirmed, Aardvark integrates with Codex to propose a patch, then re-analyzes the fix to ensure it doesn't introduce new problems. OpenAI claims that in benchmark tests, the system identified 92 percent of known and synthetically introduced vulnerabilities across test repositories, a promising indication that AI may soon shoulder part of the burden of modern code auditing.

OpenAI Launches Aardvark To Detect and Patch Hidden Bugs In Code

Post Load All Comments

Search 26 Comments Log In/Create an Account

Comments Filter:

- Re: (Score:2)
  
  by Mr. Dollar Ton ( 5495648 ) writes:
  
  If you're smart, you'll just ask the agentic here chatgpt to hide them better for you.
  - Product of the month (Score:2)
    
    by will4 ( 7250692 ) writes:
    
    While the better hiding place is humorous, the AI bug finder, fixer, tester, whatever of the month product announcement will lose steam in two years.
    I'd ask OpenAI to show the results of running the bug detection and fixer on the millions of lines of Python, C# and Java code which is in publicly accessible GitHub repositories with independent analysis of yes/no/false positive/maybe counts on the findings.
- Re: (Score:2)
  
  by HiThere ( 15173 ) writes:
  
  Hidden access points aren't bugs if they're intentional.
  - Re: (Score:2)
    
    by gweihir ( 88907 ) writes:
    
    I am sure some measures can be implemented to keep them in place.
Is this the shark jump? (Score:3)

by TurboStar ( 712836 ) writes: on Friday October 31, 2025 @10:53PM (#65765626)

My project just got a bunch of pull requests from AI. All of them were shit and a waste of my time. I mainly use AI to inject debugging instrumentation, so I know how to use it, it's just shit at systems programming.

Reply to This Share
Flag as Inappropriate
- Re: (Score:3)
  
  by Mr. Dollar Ton ( 5495648 ) writes:
  
  No, the shark jump was the "vibe coding" from all code monkeys, which actually put all the bugs that chapgpt will find in the code in the first place.
- Re:Is this the shark jump? (Score:4, Funny)
  
  by locater16 ( 2326718 ) writes: on Friday October 31, 2025 @11:11PM (#65765646)
  
  the shark has been jumped, the fridge nuked, and the horse will continue to be beaten until such time as the money fountain runs out and not until!
  
  Reply to This Parent Share
  Flag as Inappropriate
- Re:Is this the shark jump? (Score:4, Interesting)
  
  by arglebargle_xiv ( 2212710 ) writes: on Saturday November 01, 2025 @03:19AM (#65765788)
  
  It jumped the shark a long time ago, all that's left are overhyped claims. For example:
  scans, reasons about, and patches code like a human security researcher
  is actually:
  scans, mis-identifies, and cuts-and-pastes Stackoverflow snippets like a human newly-hired intern
  
  Reply to This Parent Share
  Flag as Inappropriate
- Re: (Score:2)
  
  by HiThere ( 15173 ) writes:
  
  You're assuming all AIs are equal. This is false.
  We have no real idea of how accurate this Aardvark thing is going to be. It's hard to see how it could understand intended use, but there are lots of common and hard to notice bugs that it might do a very good job of detecting. "Use after free" should be automatically detectable, for example.
  I'm not so sure about automated correction, but automated suggestion of a correction sounds quite plausible.
  - Re:Is this the shark jump? (Score:4, Insightful)
    
    by martin-boundary ( 547041 ) writes: on Saturday November 01, 2025 @09:29AM (#65766064)
    
    You're assuming AI version number matters. It does not.
    Programming is an exercise in turning imprecise human language into a precise domain specific language. Once the problem has been translated, it can be solved within the paradigms of the chosen language, but the bugs cannot be detected without knowing the original intent which is not encoded.
    An AI with current technology cannot take the DSL solution and judge what the original intent of the solution was. That requires knowing what the problem was, which isn't fully specified, merely partially translated by programmers into the DSL that the AI might see. The translation is done by humans with expertise that the AI doesn't have, because the AI training data is always limited to actual DSL solution snippets from many unrelated problems, whereas the human programmer is given the real world problem to work through.
    TL;DR. If you can't solve the problem by piecing together existing code from GitHub, then no AI can solve the problem. Worthwhile software is not simply pieced together from a thousand GitHub snippets
    
    Reply to This Parent Share
    Flag as Inappropriate
    - Re: (Score:2)
      
      by gweihir ( 88907 ) writes:
      
      TL;DR. If you can't solve the problem by piecing together existing code from GitHub, then no AI can solve the problem. Worthwhile software is not simply pieced together from a thousand GitHub snippets
      That nicely sums it up. Add that "AI" will apparently "piece together the snippets" in insecure ways quite often and any "correction" will make things a lot worse. Too many people are still deep in the hype and simply do not understand that LLMs cannot do insight or reasoning, despite the lies of the ones profiteering off them.
- Re: (Score:2)
  
  by locofungus ( 179280 ) writes:
  
  If this actually does verify that there's an exploitable vulnerability and also verifies that the fix resolves it then perhaps it won't be so bad.
  I don't have much experience of this but I was once working on a patch where I knew I'd got a security vulnerability (command injection via quotes in strings I needed to pass to a script) but before I worked on that (where it wasn't obvious how to do it securely) I wanted to test if the idea worked at all.
  I tried chatgpt for ideas on how to resolve the vulnerabili
Great, more snake-oil... (Score:2, Insightful)

by gweihir ( 88907 ) writes:

But I guess the scam continues to work if they just promise enough.
And then next week... (Score:2)

by ebunga ( 95613 ) writes:

Someone poisoned the AI to inject vulnerabilities instead of fix them.
- Re: (Score:2)
  
  by gweihir ( 88907 ) writes:
  
  Why next week? Criminal and criminally-minded organizations (whether TLAs or crime networks) probably already have done that and will use them in a few years.
It needs to be able to prove it. (Score:3)

by Gravis Zero ( 934156 ) writes: on Saturday November 01, 2025 @12:59AM (#65765732)

For this to be effective rather than disastrous, it needs to be able to generate a proof that can be validated to confirm that a flaw exists. That is not to say that it will fix it correctly, only that the flaw is real.
Anything short of this is undoubtedly going to generate "fixes" that are just AI slop. Anyone needing convincing need only look at the mess AI has made for the curl project.

Reply to This Share
Flag as Inappropriate
- Re: (Score:2)
  
  by gweihir ( 88907 ) writes:
  
  Indeed. But proof generation is outside of what machines can do, except for very simple things. Many (most?) of the items here are not going to be very simple things. Oh, and proof generation requires formal specifications of what to demonstrate, and that is entirely outside of what AI can do, because it requires a reasonable and pretty complete world model. We do not have those in forms machines can use.
  - Re: (Score:2)
    
    by Gravis Zero ( 934156 ) writes:
    
    Creating a mental model of how code executes is far more complex when working with source code. It would be far easier to examine a compiled binary because an avoid issues with languages entirely. That said, even reverse engineering tools lack the capabilities entirely map out all possible execution paths for typical closed-system applications (doesn't load external executable data). However, I know it's possible, so it seems like it's only a matter of time before it gets made and then AI might have a chanc
Aawkward (Score:2)

by Misagon ( 1135 ) writes:

Even if this was actually useful and did not mess up people's code, this comes from a company that has no credibility for not doing so.
And BTW, there is already an AI project named Aardvark, used for predicting the weather.
You'd think they could have done a simple google search before announcing it...
my favorite part (Score:3)

by dfghjk ( 711126 ) writes: on Saturday November 01, 2025 @06:48AM (#65765930)

" then re-analyzes the fix to ensure it doesn't introduce new problems."
A cascading catastrophuck.

Reply to This Share
Flag as Inappropriate
- Re: (Score:2)
  
  by gweihir ( 88907 ) writes:
  
  Indeed. Just one selective blindness (and it will have many) would then be enough to endanger basically all code that ever goes through this.
Yay! (Score:2)

by NewtonsLaw ( 409638 ) writes:

As holder of the domain aardvark.co.nz I can already feel the benefit.
A good direction, but... (Score:2)

by MpVpRb ( 1423381 ) writes:

...Methinks that the early results may have been exaggerated
I can imagine future AI being developed to do this accurately, but current AI probably needs work, a lot of work
In a hype-driven world where billions are at stake, treat everything as exaggeration until confirmed by multiple sources
Skepticism aside, this is a good direction to be pursuing, much better that generating mountains of pop-culture slop
Better Solutions (Score:2)

by Thelasko ( 1196535 ) writes:

We use AI to do code reviews on merge requests. IMHO that's the only thing it's good for. Giving it free reign to make changes is a disaster waiting to happen. However, it is quite useful at reviewing code and suggesting improvements.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

OpenAI Launches Aardvark To Detect and Patch Hidden Bugs In Code (infoworld.com) 26

OpenAI Launches Aardvark To Detect and Patch Hidden Bugs In Code More | Reply Login

OpenAI Launches Aardvark To Detect and Patch Hidden Bugs In Code

Re: (Score:2)

Product of the month (Score:2)

Re: (Score:2)

Re: (Score:2)

Is this the shark jump? (Score:3)

Re: (Score:3)

Re:Is this the shark jump? (Score:4, Funny)

Re:Is this the shark jump? (Score:4, Interesting)

Re: (Score:2)

Re:Is this the shark jump? (Score:4, Insightful)

Re: (Score:2)

Re: (Score:2)

Great, more snake-oil... (Score:2, Insightful)

And then next week... (Score:2)

Re: (Score:2)

It needs to be able to prove it. (Score:3)

Re: (Score:2)

Re: (Score:2)

Aawkward (Score:2)

my favorite part (Score:3)

Re: (Score:2)

Yay! (Score:2)

A good direction, but... (Score:2)

Better Solutions (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot