AI-Assisted Bug Reports Are Seriously Annoying For Developers (theregister.com) 29
Generative AI models like Google Bard and GitHub Copilot are increasingly being used in various industries, but users often overlook their limitations, leading to serious errors and inefficiencies. Daniel Stenberg of curl and libcurl highlights a specific problem of AI-generated security reports: when reports are made to look better and to appear to have a point, it takes a longer time to research and eventually discard it. "Every security report has to have a human spend time to look at it and assess what it means," adds Stenberg. "The better the crap, the longer time and the more energy we have to spend on the report until we close it." The Register reports: The curl project offers a bug bounty to security researchers who find and report legitimate vulnerabilities. According to Stenberg, the program has paid out over $70,000 in rewards to date. Of 415 vulnerability reports received, 64 have been confirmed as security flaws and 77 have been deemed informative -- bugs without obvious security implications. So about 66 percent of the reports have been invalid. The issue for Stenberg is that these reports still need to be investigated and that takes developer time. And while those submitting bug reports have begun using AI tools to accelerate the process of finding supposed bugs and writing up reports, those reviewing bug reports still rely on human review. The result of this asymmetry is more plausible-sounding reports, because chatbot models can produce detailed, readable text without regard to accuracy.
As Stenberg puts it, AI produces better crap. "A crap report does not help the project at all. It instead takes away developer time and energy from something productive. Partly because security work is considered one of the most important areas so it tends to trump almost everything else." As examples, he cites two reports submitted to HackerOne, a vulnerability reporting community. One claimed to describe Curl CVE-2023-38545 prior to actual disclosure. But Stenberg had to post to the forum to make clear that the bug report was bogus. He said that the report, produced with the help of Google Bard, "reeks of typical AI style hallucinations: it mixes and matches facts and details from old security issues, creating and making up something new that has no connection with reality." [...]
Stenberg readily acknowledges that AI assistance can be genuinely helpful. But he argues that having a human in the loop makes the use and outcome of AI tools much better. Even so, he expects the ease and utility of these tools, coupled with the financial incentive of bug bounties, will lead to more shoddy LLM-generated security reports, to the detriment of those on the receiving end.
As Stenberg puts it, AI produces better crap. "A crap report does not help the project at all. It instead takes away developer time and energy from something productive. Partly because security work is considered one of the most important areas so it tends to trump almost everything else." As examples, he cites two reports submitted to HackerOne, a vulnerability reporting community. One claimed to describe Curl CVE-2023-38545 prior to actual disclosure. But Stenberg had to post to the forum to make clear that the bug report was bogus. He said that the report, produced with the help of Google Bard, "reeks of typical AI style hallucinations: it mixes and matches facts and details from old security issues, creating and making up something new that has no connection with reality." [...]
Stenberg readily acknowledges that AI assistance can be genuinely helpful. But he argues that having a human in the loop makes the use and outcome of AI tools much better. Even so, he expects the ease and utility of these tools, coupled with the financial incentive of bug bounties, will lead to more shoddy LLM-generated security reports, to the detriment of those on the receiving end.
Simple solution (Score:1)
Use AI to assess bug claims and to award bounties.
You're very welcome.
All going as planned. (Score:5, Funny)
> "The better the crap, the longer time and the more energy we have to spend on the report until we close it."
That's the scam. They flood you with so much AI generated noise that you need to purchase AI services to filter through it.
Re:All going as planned. (Score:5, Interesting)
As the article says:
Stenberg readily acknowledges that AI assistance can be genuinely helpful.
But so far, I have not seen an example of AI being genuinely helpful. Except for trivial cases of "I was wasting my time writing gibberish, and the AI was so much faster at it than I was."
Re:All going as planned. (Score:2)
> But so far, I have not seen an example of AI being genuinely helpful.
https://www.youtube.com/watch?... [youtube.com]
Re:All going as planned. (Score:2)
Ok, but that's not the large language model style as was being discussed. I should have been more clear.
Re:All going as planned. (Score:1)
Re:All going as planned. (Score:3)
But so far, I have not seen an example of AI being genuinely helpful.
I have very occasionally used it to find API calls. If the words involved are just too common, or the terminology happens to be too vague, making it ungooglable, you can get the call by kind of writing what you want it to do and chat GPT will have a good chance of telling you what the API call is.
I've used it once or twice after a recommendation from a friend. Pretty neat. That's about it though...
Re:All going as planned. (Score:2)
It's people trying to claim bug bounties. Because there is a payout, they spend all day running automated tools and AI on code, looking for flaws to report.
Re:All going as planned. (Score:0)
Using AI to find bugs and then having humans validate and report on them is different than using AI to report bugs every time you run into an issue.
I get it but (Score:1, Interesting)
As a developer myself I get the problem but at the same time I hate reporting bugs. I often have to explain to the developers why they are morons, etc. It takes too much time. I want to simply point out the flaw then let them figure it out because they know the system. There are way too many bugs in software for me to sit and hand-hold everyone with lots of writing especially when I'm not very experienced with their code.
I usually don't report bugs because it's too much trouble. But yes, now I'll use AI to shove something out. I don't do it blindly. I use the AI to convert my simple statement, the core, and turn it in to something that apparently normal people need to parse. Basically a lot of unnecessary detail because the developers are too lazy or too stupid to understand the problem. Regularly I report obviously problems where the developer simply needs to take 60 seconds to test it themselves and there you go... but no, they won't even look at it. I don't do this when people report bugs to me. I use my skill to solve the problem.
If I could fix everything myself I would but there isn't enough time in the universe for such things.
Re:I get it but (Score:5, Insightful)
With your attitude towards them I cannot blame anybody for ignoring your input. Let me guess, you cannot be bothered to report the version or the steps necessary to reproduce. Do your bug reports contain more then something along the lines of "You morons, a three year old wouldn't make this bug. Fix it."?
Re:I get it but (Score:2)
With your attitude towards them I cannot blame anybody for ignoring your input. Let me guess, you cannot be bothered to report the version or the steps necessary to reproduce. Do your bug reports contain more then something along the lines of "You morons, a three year old wouldn't make this bug. Fix it."?
I agree with the GP post. I can submit screenshots, error codes, specific URLs of pages where the error occurs, detailed steps to reproduce, painstakingly accurate lists of exactly which version numbers of the relevant platforms/OS/app/browser/security modes/etc. and STILL get no acknowledgement from the people whose jobs are to administer/maintain the systems in question. I have run into a depressingly large number of people who very obviously do not analyze problem reports based on whether the problem can be replicated, but instead based on how low-hanging the potential mitigation fruit would be to grasp.
Thus, I have un-learned the idea that I can simply report problems to the people who are supposed to know those systems. If I want the problems to actually get fixed, I must spend a couple hours of my time locating source, tech spechs, documentation, etc. so I can also provide a mitigation hypothesis. It's like you have to reassure someone whose job it is to fix something, that the problem CAN be fixed. If you don't give them starting details on a potential solution, they shrug it off no matter how much detail you put into the problem.
Also like the GP, any system or process I'm responsible for, my service mentality is the reverse -- yes, users and even other system managers/admins do sometimes submit poorly-document reports, but my approach is that the burden of proof is on ME to establish that their reported problem did NOT occur. Their job isn't to take ownership for my systems; that would be inefficient because then we'd all have to constantly sit in on each other's projects to make sure we understand exactly what every other team needs. Knowing my system's capabilities and vulnerabilities is literally what I'm here for. If someone reports a problem, it is MY duty to go TRY as hard as I can to make their reported problem happen, using my knowledge/expertise with the particular system to perform thought experiments of "If this problem were real, what kinds of user actions or processing states would be most likely to produce it?"
Sure, it would be easy if everyone hand-fed me exactly what I needed to just be a button-masher, but then... at that point why do they even need me anymore?
Yeah, it's a little bit of extra work for me to take poorly-detailed reports seriously, sigh internally, but perform my due diligence anyway. But you know what? My shit works, my shit doesn't jump off a cliff, and my area has a high reputation across the organization as the place where weird mysteries go to die, from being exposed and resolved. That gives me a lot of personal satisfaction - both from getting to solve an obscure problem and from my moral/ethical commitment to other human beings. Taking ownership to master my shit means other people have more time to master their shit, so the entire organization fires on all cylinders.
System/code maintenance folks should be like doctors. It's not the patient's job to know internal medicine.
YOU: "Doc, I've been getting this sharp stabbing incapacitating pain in my lower right abdomen that has been increasing in frequency and severity for the past day."
DOCTOR: "Does it hurt when I press here or here?"
YOU: "Some, but that's not quite the same as the pain I've been feeling."
DOCTOR: "Well I tried the first thing that came to mind. If it doesn't hurt when I do that then you're probably improperly describing the level of pain to me, so I can't be expected to help. Go home and sleep it off, I guess."
YOU, 16 HOURS LATER: "Well this sucks but I know it's what I deserve as punishment for not being able to speak Doctorese."
DOCTOR, AT A DINNER PARTY: "This one time, a patient of mine almost died because they didn't use the correct terminology when one of their major processes was crashing. Who could possible have known what all might go medically wrong with a human body?"
Who, indeed?
Re:I get it but (Score:2)
That sort of thing, even if it is still factually correct, just makes it take longer to read and parse what it is saying.
Re:I get it but (Score:4, Insightful)
But yes, now I'll use AI to shove something out. I don't do it blindly. I use the AI to convert my simple statement, the core, and turn it in to something that apparently normal people need to parse.
No, the AI generated output is not what people "need to parse". If the LLM processed your prompt fine, then your prompt would have done just fine as a writeup of a bug.
The whole point is that LLM extended material is obnoxious in this context. At *best* it buries your core actionable detail among a bunch of empty verbosity that the reader must wade through. It is also highly likely to invent details to further muddy the waters, sometimes replacing your detail with another.
While worse with LLMs, this isn't a new phenomenon. I have always hated it when I came across a communication that clearly wanted to be "professional", as it is pointlessly verbose and takes a lot of time when a brief 3 or 4 sentences would have sufficed.
If the system analyzes your report, LLM might be able to more accurately suggest possible duplicates. It might be able to recognize potential documentation material to offer an afflicted user. It really sucks at "enhancing" human to human communication if the intent is to sincerely and fully convey information.
Re:I get it but (Score:5, Insightful)
"I hate reporting bugs. I often have to explain to the developers why they are morons, etc."
No need. You should stop. I'm sure you'll be sorely missed, and your contributions are probably mission critical, but your mental health is more important. Tend to it, even if it means nobody hears from you again.
Sort the bug reports with a simple IF statement (Score:-1)
If the reporter is a Pajeet, throw it in the trash.
Pajeets and the other various sand-nigg races don't know shit about code but are certainly willing to scam people with bullshit.
False positives... (Score:2)
False positives about some "issue" and chasing your tail...well I've know team leads and managers that stir up the developers on the team like a herd of cats...and running all over the place.
Seriously annoying, like "super serial serious" or "I am unanimous in seriously super annoyed." ?? :)
JoshK.
A bug has been detected in Slashdot!!! (Score:0)
Please refer to your logfiles, 1/3/24, 12:04:03 PM, line 420.
You're welcome!
Daniel Stenberg of curl and libcurl (Score:2)
That's a weird way to phrase it.
"I am John of Mordor and Forodwaith".
AI filters have been used for years (Score:0)
CRM1144, the spam filter software, is used by the Department of Transportation to filter false from legitimate vehicle accident reports.
https://crm114.sourceforge.net... [sourceforge.net]
AI, n. see GIGO (Score:0)
Maybe having "a capable human" in the loop makes things better. One of our QA people has a boner for ChatGPT (which they regularly call "ChatGTP" so that should be a clue) and is regularly wasting our time looking at C# and T-SQL code that they've produced for testing environments using it. Bug-ridden, badly performing crap is what it produces, I wish they'd heed our advice and leave it the fuck alone.
Why am I not surprised? (Score:2)
LLMs are good at making the most inane crap sound good. They are not good at all at recognizing reality.
Just wait till you strike the rabid AI User (Score:3)
Re:Just wait till you strike the rabid AI User (Score:3)
User: Your wrong the AI told me...
Support: Then ask the AI to solve your issue.
Where did the 70k come from (Score:5, Interesting)
"According to Stenberg, the program has paid out over $70,000 in rewards to date."
I know this will come off a whiny, because I guess it is, but how the hell did an open source project like curl manage to get $70k to pay out for these things. I ask from the perspective of someone who have been running unixODBC for the last 20 years, and maybe I am wondering where you apply.
Re:Where did the 70k come from (Score:3)
This might give you an idea: https://opencollective.com/cur... [opencollective.com]
Re:Where did the 70k come from (Score:3)
Yep, I guess that's is an idea. Thanks. Maybe I will look at that, or maybe I will just continue whining :-)
Not an easy balance (Score:1)
Sturgeon and Pareto (Score:1)
If you apply Sturgeon's Law, and Pareto's Principle, then 66% noise is actually pretty good...