![Australia Australia](http://a.fsdn.com/sd/topics/australia_64.png)
![Microsoft Microsoft](http://a.fsdn.com/sd/topics/microsoft_64100.png)
![IT IT](http://a.fsdn.com/sd/topics/it_64.png)
After Copilot Trial, Government Staff Rated Microsoft's AI Less Useful Than Expected (theregister.com) 16
An anonymous reader shares a report: Australia's Department of the Treasury has found that Microsoft's Copilot can easily deliver return on investment, but staff exposed to the AI assistant came away from the experience less confident it will help them at work.
The Department conducted a 14-week trial of Microsoft 365 Copilot during 2024 and asked for volunteers to participate. 218 put up their hands and then submitted to surveys about their experiences using Microsoft's AI helpers. Those surveys are the basis of an evaluation report published on Tuesday. The report reveals that after the trial participants rated Copilot less useful than they hoped it would be, as it was applicable to fewer workloads than they hoped would be the case.
Workers' views on Copilot's ability to improve their work also fell. Usage of Copilot was lower than expected, with most participants using it two or three times a week, or less. reported using Copilot 2-3 times per week or less. Treasury thinks it probably set unrealistically high expectations before the trial, and noted that participants often suggested extra training would be valuable.
The Department conducted a 14-week trial of Microsoft 365 Copilot during 2024 and asked for volunteers to participate. 218 put up their hands and then submitted to surveys about their experiences using Microsoft's AI helpers. Those surveys are the basis of an evaluation report published on Tuesday. The report reveals that after the trial participants rated Copilot less useful than they hoped it would be, as it was applicable to fewer workloads than they hoped would be the case.
Workers' views on Copilot's ability to improve their work also fell. Usage of Copilot was lower than expected, with most participants using it two or three times a week, or less. reported using Copilot 2-3 times per week or less. Treasury thinks it probably set unrealistically high expectations before the trial, and noted that participants often suggested extra training would be valuable.
It's not great but it will get better (Score:3, Interesting)
I recently installed it in vscode for code completions and didn't notice much difference from plain old intellisense.
I get that it's early though so I will keep checking in.
I tried to trick it into giving up other people's api keys and env vars and it didn't bite so that's nice at least.
Not sure how it could improve (Score:3)
Intellisense does everything that can be done without actually knowing what code you want to write next and without mind reading I don't see how an AI can do that no matter how smart. If it tried it would probably get it wrong and become more of a hindrance than a help.
Re: (Score:2)
Yeah, I get what you mean but for example if I can type 'function titlecase(str){' and tab to complete with hopefully something I've already used in other projects, that would shave a few minutes off me looking through old code for it. That's mostly what I want from AI completions.
Re: (Score:3)
In those sort of cases just put the function in an library, you don't need an AI to rewrite it for you all the time.
Re: (Score:1)
That's still tedious though, it's exactly the kind of thing AI can take off my plate.
Re: (Score:2)
Maybe, but I don't see why it would require AI. Standard programming inside Intellisense could achieve the same. The fact it hasn't been done probably says MS don't think its worth the bother.
Re: Not sure how it could improve (Score:2)
but this kind of stuff seems like exactly what AI should be doing. it should be able to look at what Iâ(TM)ve done or what Iâ(TM)ve seen or what Iâ(TM)ve typed or whatever, and then into it that I may want to either do that again, or have an iteration of that which takes another step.
I have had thoughts around this for decades, and it has seemed to me that there is no reason why these sorts of features should not exist. Bonus? I donâ(TM)t even think it needs some big huge data center pow
My mileage varies (Score:5, Insightful)
Re: It's not great but it will get better (Score:3)
It has been the opposite for me. When it was shiny and new, it was impressive and I had high hopes for it. Was able to adopt it quickly for coding small snippets that would otherwise take me an hour to walk through on my own.
Now that we are a few years in, reality has bled through. The inconsistency of results makes it frustrating to use at times, and the quality of results overall does not seem to be improving over time, IMO.
Re: (Score:2)
GitHub Copilot GA'd as a paid service over 3 years ago.
Re: (Score:1)
Yeah. They have a free tier so I gave it a try. Not super impressive but I expect it to be worthwhile within a year or so.
Read: Management fooled by AI salesperson (Score:3)
Treasury thinks it probably set unrealistically high expectations before the trial
If the management set unrealistically high expectations, I bet it is because they have been to some lobbyist/sales meeting and pumped full of Microsoft propaganda, too technical for them to judge.
Re: (Score:2)
Treasury thinks it probably set unrealistically high expectations before the trial
If the management set unrealistically high expectations, I bet it is because they have been to some lobbyist/sales meeting and pumped full of Microsoft propaganda, too technical for them to judge.
Sir, this is the Australian Taxation Office (ATO) we're talking about... what you suggest would require far too much intelligence, forward thinking and initiative for a government department.
One thing I am certain of, AI would somehow manage to result in even slower responses and more screw ups from the ATO.
Er, incentives anyone? (Score:2)
I had a coworker a couple decades ago who loved manually deleting centerlines and stuff from CAD drawings (to make tech manual illustrations of equipment).
Nice restful task for him.
When I pointed out that you could usually just turn off a layer or two, he was like "shhhhh!!!"
My point being, people might not always be 100% honest when you ask them about how helpful labor saving stuff is ...
Bullshit generators (Score:2)
These language models in general are basically "bullshit generators" that sometimes bullshit so well they end up saying the truth, but the failure mode is a text that looks as much as possible as the thing you want, but it's not quite.
Copilot Has Regressed (Score:3)