Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
AI Bug Programming

AI-Assisted Bug Reports Are Seriously Annoying For Developers (theregister.com) 29

Generative AI models like Google Bard and GitHub Copilot are increasingly being used in various industries, but users often overlook their limitations, leading to serious errors and inefficiencies. Daniel Stenberg of curl and libcurl highlights a specific problem of AI-generated security reports: when reports are made to look better and to appear to have a point, it takes a longer time to research and eventually discard it. "Every security report has to have a human spend time to look at it and assess what it means," adds Stenberg. "The better the crap, the longer time and the more energy we have to spend on the report until we close it." The Register reports: The curl project offers a bug bounty to security researchers who find and report legitimate vulnerabilities. According to Stenberg, the program has paid out over $70,000 in rewards to date. Of 415 vulnerability reports received, 64 have been confirmed as security flaws and 77 have been deemed informative -- bugs without obvious security implications. So about 66 percent of the reports have been invalid. The issue for Stenberg is that these reports still need to be investigated and that takes developer time. And while those submitting bug reports have begun using AI tools to accelerate the process of finding supposed bugs and writing up reports, those reviewing bug reports still rely on human review. The result of this asymmetry is more plausible-sounding reports, because chatbot models can produce detailed, readable text without regard to accuracy.

As Stenberg puts it, AI produces better crap. "A crap report does not help the project at all. It instead takes away developer time and energy from something productive. Partly because security work is considered one of the most important areas so it tends to trump almost everything else." As examples, he cites two reports submitted to HackerOne, a vulnerability reporting community. One claimed to describe Curl CVE-2023-38545 prior to actual disclosure. But Stenberg had to post to the forum to make clear that the bug report was bogus. He said that the report, produced with the help of Google Bard, "reeks of typical AI style hallucinations: it mixes and matches facts and details from old security issues, creating and making up something new that has no connection with reality." [...]

Stenberg readily acknowledges that AI assistance can be genuinely helpful. But he argues that having a human in the loop makes the use and outcome of AI tools much better. Even so, he expects the ease and utility of these tools, coupled with the financial incentive of bug bounties, will lead to more shoddy LLM-generated security reports, to the detriment of those on the receiving end.

This discussion has been archived. No new comments can be posted.

AI-Assisted Bug Reports Are Seriously Annoying For Developers

Comments Filter:
  • by Ferocitus ( 4353621 ) on Thursday January 04, 2024 @09:11PM (#64133211)

    Use AI to assess bug claims and to award bounties.
    You're very welcome.

  • by NFN_NLN ( 633283 ) on Thursday January 04, 2024 @09:17PM (#64133217)

    > "The better the crap, the longer time and the more energy we have to spend on the report until we close it."

    That's the scam. They flood you with so much AI generated noise that you need to purchase AI services to filter through it.

  • I get it but (Score:1, Interesting)

    by Anonymous Coward on Thursday January 04, 2024 @09:19PM (#64133223)

    As a developer myself I get the problem but at the same time I hate reporting bugs. I often have to explain to the developers why they are morons, etc. It takes too much time. I want to simply point out the flaw then let them figure it out because they know the system. There are way too many bugs in software for me to sit and hand-hold everyone with lots of writing especially when I'm not very experienced with their code.

    I usually don't report bugs because it's too much trouble. But yes, now I'll use AI to shove something out. I don't do it blindly. I use the AI to convert my simple statement, the core, and turn it in to something that apparently normal people need to parse. Basically a lot of unnecessary detail because the developers are too lazy or too stupid to understand the problem. Regularly I report obviously problems where the developer simply needs to take 60 seconds to test it themselves and there you go... but no, they won't even look at it. I don't do this when people report bugs to me. I use my skill to solve the problem.

    If I could fix everything myself I would but there isn't enough time in the universe for such things.

    • Re:I get it but (Score:5, Insightful)

      by Josef Meixner ( 1020161 ) on Friday January 05, 2024 @05:51AM (#64133787) Homepage

      With your attitude towards them I cannot blame anybody for ignoring your input. Let me guess, you cannot be bothered to report the version or the steps necessary to reproduce. Do your bug reports contain more then something along the lines of "You morons, a three year old wouldn't make this bug. Fix it."?

      • by SomePoorSchmuck ( 183775 ) on Friday January 05, 2024 @05:21PM (#64135137) Homepage

        With your attitude towards them I cannot blame anybody for ignoring your input. Let me guess, you cannot be bothered to report the version or the steps necessary to reproduce. Do your bug reports contain more then something along the lines of "You morons, a three year old wouldn't make this bug. Fix it."?

        I agree with the GP post. I can submit screenshots, error codes, specific URLs of pages where the error occurs, detailed steps to reproduce, painstakingly accurate lists of exactly which version numbers of the relevant platforms/OS/app/browser/security modes/etc. and STILL get no acknowledgement from the people whose jobs are to administer/maintain the systems in question. I have run into a depressingly large number of people who very obviously do not analyze problem reports based on whether the problem can be replicated, but instead based on how low-hanging the potential mitigation fruit would be to grasp.

        Thus, I have un-learned the idea that I can simply report problems to the people who are supposed to know those systems. If I want the problems to actually get fixed, I must spend a couple hours of my time locating source, tech spechs, documentation, etc. so I can also provide a mitigation hypothesis. It's like you have to reassure someone whose job it is to fix something, that the problem CAN be fixed. If you don't give them starting details on a potential solution, they shrug it off no matter how much detail you put into the problem.

        Also like the GP, any system or process I'm responsible for, my service mentality is the reverse -- yes, users and even other system managers/admins do sometimes submit poorly-document reports, but my approach is that the burden of proof is on ME to establish that their reported problem did NOT occur. Their job isn't to take ownership for my systems; that would be inefficient because then we'd all have to constantly sit in on each other's projects to make sure we understand exactly what every other team needs. Knowing my system's capabilities and vulnerabilities is literally what I'm here for. If someone reports a problem, it is MY duty to go TRY as hard as I can to make their reported problem happen, using my knowledge/expertise with the particular system to perform thought experiments of "If this problem were real, what kinds of user actions or processing states would be most likely to produce it?"

        Sure, it would be easy if everyone hand-fed me exactly what I needed to just be a button-masher, but then... at that point why do they even need me anymore?
        Yeah, it's a little bit of extra work for me to take poorly-detailed reports seriously, sigh internally, but perform my due diligence anyway. But you know what? My shit works, my shit doesn't jump off a cliff, and my area has a high reputation across the organization as the place where weird mysteries go to die, from being exposed and resolved. That gives me a lot of personal satisfaction - both from getting to solve an obscure problem and from my moral/ethical commitment to other human beings. Taking ownership to master my shit means other people have more time to master their shit, so the entire organization fires on all cylinders.

        System/code maintenance folks should be like doctors. It's not the patient's job to know internal medicine.
        YOU: "Doc, I've been getting this sharp stabbing incapacitating pain in my lower right abdomen that has been increasing in frequency and severity for the past day."
        DOCTOR: "Does it hurt when I press here or here?"
        YOU: "Some, but that's not quite the same as the pain I've been feeling."
        DOCTOR: "Well I tried the first thing that came to mind. If it doesn't hurt when I do that then you're probably improperly describing the level of pain to me, so I can't be expected to help. Go home and sleep it off, I guess."
        YOU, 16 HOURS LATER: "Well this sucks but I know it's what I deserve as punishment for not being able to speak Doctorese."
        DOCTOR, AT A DINNER PARTY: "This one time, a patient of mine almost died because they didn't use the correct terminology when one of their major processes was crashing. Who could possible have known what all might go medically wrong with a human body?"

        Who, indeed?

    • by jonbryce ( 703250 ) on Friday January 05, 2024 @06:12AM (#64133805) Homepage

      That sort of thing, even if it is still factually correct, just makes it take longer to read and parse what it is saying.

    • Re:I get it but (Score:4, Insightful)

      by Junta ( 36770 ) on Friday January 05, 2024 @07:35AM (#64133885)

      But yes, now I'll use AI to shove something out. I don't do it blindly. I use the AI to convert my simple statement, the core, and turn it in to something that apparently normal people need to parse.

      No, the AI generated output is not what people "need to parse". If the LLM processed your prompt fine, then your prompt would have done just fine as a writeup of a bug.

      The whole point is that LLM extended material is obnoxious in this context. At *best* it buries your core actionable detail among a bunch of empty verbosity that the reader must wade through. It is also highly likely to invent details to further muddy the waters, sometimes replacing your detail with another.

      While worse with LLMs, this isn't a new phenomenon. I have always hated it when I came across a communication that clearly wanted to be "professional", as it is pointlessly verbose and takes a lot of time when a brief 3 or 4 sentences would have sufficed.

      If the system analyzes your report, LLM might be able to more accurately suggest possible duplicates. It might be able to recognize potential documentation material to offer an afflicted user. It really sucks at "enhancing" human to human communication if the intent is to sincerely and fully convey information.

    • Re:I get it but (Score:5, Insightful)

      by Petersko ( 564140 ) on Friday January 05, 2024 @11:36AM (#64134349)

      "I hate reporting bugs. I often have to explain to the developers why they are morons, etc."

      No need. You should stop. I'm sure you'll be sorely missed, and your contributions are probably mission critical, but your mental health is more important. Tend to it, even if it means nobody hears from you again.

  • by Anonymous Coward on Thursday January 04, 2024 @09:34PM (#64133239)

    If the reporter is a Pajeet, throw it in the trash.

    Pajeets and the other various sand-nigg races don't know shit about code but are certainly willing to scam people with bullshit.

  • by joshuark ( 6549270 ) on Thursday January 04, 2024 @09:42PM (#64133259)

    False positives about some "issue" and chasing your tail...well I've know team leads and managers that stir up the developers on the team like a herd of cats...and running all over the place.

    Seriously annoying, like "super serial serious" or "I am unanimous in seriously super annoyed." ?? :)

    JoshK.

  • by Anonymous Coward on Thursday January 04, 2024 @10:13PM (#64133307)

    Please refer to your logfiles, 1/3/24, 12:04:03 PM, line 420.

    You're welcome!

  • by war4peace ( 1628283 ) on Thursday January 04, 2024 @10:56PM (#64133361)

    That's a weird way to phrase it.
    "I am John of Mordor and Forodwaith".

  • by Anonymous Coward on Friday January 05, 2024 @12:46AM (#64133481)

    CRM1144, the spam filter software, is used by the Department of Transportation to filter false from legitimate vehicle accident reports.

                    https://crm114.sourceforge.net... [sourceforge.net]

  • by Anonymous Coward on Friday January 05, 2024 @12:51AM (#64133489)

    Stenberg readily acknowledges that AI assistance can be genuinely helpful. But he argues that having a human in the loop makes the use and outcome of AI tools much better.

    Maybe having "a capable human" in the loop makes things better. One of our QA people has a boner for ChatGPT (which they regularly call "ChatGTP" so that should be a clue) and is regularly wasting our time looking at C# and T-SQL code that they've produced for testing environments using it. Bug-ridden, badly performing crap is what it produces, I wish they'd heed our advice and leave it the fuck alone.

  • by gweihir ( 88907 ) on Friday January 05, 2024 @01:02AM (#64133505)

    LLMs are good at making the most inane crap sound good. They are not good at all at recognizing reality.

  • by ClueHammer ( 6261830 ) on Friday January 05, 2024 @01:56AM (#64133567)
    Already has happened. "User: The AI told me to do this. Support: That option does not exist. User: Your wrong the AI told me... ", the AI of course generated lies and the user was determined we where wrong, he was right and would not listen to reason.
  • by lurcher ( 88082 ) on Friday January 05, 2024 @04:24AM (#64133703) Homepage

    "According to Stenberg, the program has paid out over $70,000 in rewards to date."

    I know this will come off a whiny, because I guess it is, but how the hell did an open source project like curl manage to get $70k to pay out for these things. I ask from the perspective of someone who have been running unixODBC for the last 20 years, and maybe I am wondering where you apply.

  • by deijmaster ( 952698 ) on Friday January 05, 2024 @08:44AM (#64133939) Homepage
    The coming of “open” bounties has created many similar situations. That's the main reason why most my interventions have put a halt on these and instead focused on creating small internal/external teams - less in more - especially when it's well done. So I absolutely get his point, but at the same time - it’s a bit like paying insurance - annoying, but good when something significant happens. My core expertise is building and managing these teams for decades. Effort needs to be logical and affordable for your line of business. And by the time developers get the information, it should be clean and pretty damn close to confirmed. But then, how many times have I had clients asking about the cost and time of similar setups? Many. If you want me to go faster, then noise is an absolute possibility. Too long, then the client gets frustrated with the cost - it’s not an easy balance.
  • by sfsp ( 655361 ) on Friday January 05, 2024 @09:02AM (#64133969) Homepage Journal

    If you apply Sturgeon's Law, and Pareto's Principle, then 66% noise is actually pretty good...

Force needed to accelerate 2.2lbs of cookies = 1 Fig-newton to 1 meter per second

Working...