OpenAI Launches Aardvark To Detect and Patch Hidden Bugs In Code (infoworld.com) 26
OpenAI has introduced Aardvark, a GPT-5-powered autonomous agent that scans, reasons about, and patches code like a human security researcher. "By embedding itself directly into the development pipeline, Aardvark aims to turn security from a post-development concern into a continuous safeguard that evolves with the software itself," reports InfoWorld. From the report: What makes Aardvark unique, OpenAI noted, is its combination of reasoning, automation, and verification. Rather than simply highlighting potential vulnerabilities, the agent promises multi-stage analysis -- starting by mapping an entire repository and building a contextual threat model around it. From there, it continuously monitors new commits, checking whether each change introduces risk or violates existing security patterns.
Additionally, upon identifying a potential issue, Aardvark attempts to validate the exploitability of the finding in a sandboxed environment before flagging it. This validation step could prove transformative. Traditional static analysis tools often overwhelm developers with false alarms -- issues that may look risky but aren't truly exploitable. "The biggest advantage is that it will reduce false positives significantly," noted Jain. "It's helpful in open source codes and as part of the development pipeline."
Once a vulnerability is confirmed, Aardvark integrates with Codex to propose a patch, then re-analyzes the fix to ensure it doesn't introduce new problems. OpenAI claims that in benchmark tests, the system identified 92 percent of known and synthetically introduced vulnerabilities across test repositories, a promising indication that AI may soon shoulder part of the burden of modern code auditing.
Additionally, upon identifying a potential issue, Aardvark attempts to validate the exploitability of the finding in a sandboxed environment before flagging it. This validation step could prove transformative. Traditional static analysis tools often overwhelm developers with false alarms -- issues that may look risky but aren't truly exploitable. "The biggest advantage is that it will reduce false positives significantly," noted Jain. "It's helpful in open source codes and as part of the development pipeline."
Once a vulnerability is confirmed, Aardvark integrates with Codex to propose a patch, then re-analyzes the fix to ensure it doesn't introduce new problems. OpenAI claims that in benchmark tests, the system identified 92 percent of known and synthetically introduced vulnerabilities across test repositories, a promising indication that AI may soon shoulder part of the burden of modern code auditing.
Re: (Score:2)
If you're smart, you'll just ask the agentic here chatgpt to hide them better for you.
Product of the month (Score:2)
While the better hiding place is humorous, the AI bug finder, fixer, tester, whatever of the month product announcement will lose steam in two years.
I'd ask OpenAI to show the results of running the bug detection and fixer on the millions of lines of Python, C# and Java code which is in publicly accessible GitHub repositories with independent analysis of yes/no/false positive/maybe counts on the findings.
Re: (Score:2)
Hidden access points aren't bugs if they're intentional.
Re: (Score:2)
I am sure some measures can be implemented to keep them in place.
Is this the shark jump? (Score:3)
My project just got a bunch of pull requests from AI. All of them were shit and a waste of my time. I mainly use AI to inject debugging instrumentation, so I know how to use it, it's just shit at systems programming.
Re: (Score:3)
No, the shark jump was the "vibe coding" from all code monkeys, which actually put all the bugs that chapgpt will find in the code in the first place.
Re:Is this the shark jump? (Score:4, Funny)
Re:Is this the shark jump? (Score:4, Interesting)
scans, reasons about, and patches code like a human security researcher
is actually:
scans, mis-identifies, and cuts-and-pastes Stackoverflow snippets like a human newly-hired intern
Re: (Score:2)
You're assuming all AIs are equal. This is false.
We have no real idea of how accurate this Aardvark thing is going to be. It's hard to see how it could understand intended use, but there are lots of common and hard to notice bugs that it might do a very good job of detecting. "Use after free" should be automatically detectable, for example.
I'm not so sure about automated correction, but automated suggestion of a correction sounds quite plausible.
Re:Is this the shark jump? (Score:4, Insightful)
You're assuming AI version number matters. It does not.
Programming is an exercise in turning imprecise human language into a precise domain specific language. Once the problem has been translated, it can be solved within the paradigms of the chosen language, but the bugs cannot be detected without knowing the original intent which is not encoded.
An AI with current technology cannot take the DSL solution and judge what the original intent of the solution was. That requires knowing what the problem was, which isn't fully specified, merely partially translated by programmers into the DSL that the AI might see. The translation is done by humans with expertise that the AI doesn't have, because the AI training data is always limited to actual DSL solution snippets from many unrelated problems, whereas the human programmer is given the real world problem to work through.
TL;DR. If you can't solve the problem by piecing together existing code from GitHub, then no AI can solve the problem. Worthwhile software is not simply pieced together from a thousand GitHub snippets
Re: (Score:2)
TL;DR. If you can't solve the problem by piecing together existing code from GitHub, then no AI can solve the problem. Worthwhile software is not simply pieced together from a thousand GitHub snippets
That nicely sums it up. Add that "AI" will apparently "piece together the snippets" in insecure ways quite often and any "correction" will make things a lot worse. Too many people are still deep in the hype and simply do not understand that LLMs cannot do insight or reasoning, despite the lies of the ones profiteering off them.
Re: (Score:2)
If this actually does verify that there's an exploitable vulnerability and also verifies that the fix resolves it then perhaps it won't be so bad.
I don't have much experience of this but I was once working on a patch where I knew I'd got a security vulnerability (command injection via quotes in strings I needed to pass to a script) but before I worked on that (where it wasn't obvious how to do it securely) I wanted to test if the idea worked at all.
I tried chatgpt for ideas on how to resolve the vulnerabili
Great, more snake-oil... (Score:2, Insightful)
But I guess the scam continues to work if they just promise enough.
And then next week... (Score:2)
Someone poisoned the AI to inject vulnerabilities instead of fix them.
Re: (Score:2)
Why next week? Criminal and criminally-minded organizations (whether TLAs or crime networks) probably already have done that and will use them in a few years.
It needs to be able to prove it. (Score:3)
For this to be effective rather than disastrous, it needs to be able to generate a proof that can be validated to confirm that a flaw exists. That is not to say that it will fix it correctly, only that the flaw is real.
Anything short of this is undoubtedly going to generate "fixes" that are just AI slop. Anyone needing convincing need only look at the mess AI has made for the curl project.
Re: (Score:2)
Indeed. But proof generation is outside of what machines can do, except for very simple things. Many (most?) of the items here are not going to be very simple things. Oh, and proof generation requires formal specifications of what to demonstrate, and that is entirely outside of what AI can do, because it requires a reasonable and pretty complete world model. We do not have those in forms machines can use.
Re: (Score:2)
Creating a mental model of how code executes is far more complex when working with source code. It would be far easier to examine a compiled binary because an avoid issues with languages entirely. That said, even reverse engineering tools lack the capabilities entirely map out all possible execution paths for typical closed-system applications (doesn't load external executable data). However, I know it's possible, so it seems like it's only a matter of time before it gets made and then AI might have a chanc
Aawkward (Score:2)
Even if this was actually useful and did not mess up people's code, this comes from a company that has no credibility for not doing so.
And BTW, there is already an AI project named Aardvark, used for predicting the weather.
You'd think they could have done a simple google search before announcing it...
my favorite part (Score:3)
" then re-analyzes the fix to ensure it doesn't introduce new problems."
A cascading catastrophuck.
Re: (Score:2)
Indeed. Just one selective blindness (and it will have many) would then be enough to endanger basically all code that ever goes through this.
Yay! (Score:2)
As holder of the domain aardvark.co.nz I can already feel the benefit.
A good direction, but... (Score:2)
...Methinks that the early results may have been exaggerated
I can imagine future AI being developed to do this accurately, but current AI probably needs work, a lot of work
In a hype-driven world where billions are at stake, treat everything as exaggeration until confirmed by multiple sources
Skepticism aside, this is a good direction to be pursuing, much better that generating mountains of pop-culture slop
Better Solutions (Score:2)