'Talking To Windows' Copilot AI Makes a Computer Feel Incompetent' (theverge.com) 56
Microsoft's Copilot AI assistant in Windows 11 fails to replicate the capabilities shown in the company's TV advertisements. The Verge tested Copilot Vision over a week using the same prompts featured in ads airing during NFL games. When asked to identify a HyperX QuadCast 2S microphone visible in a YouTube video -- a task successfully completed in Microsoft's ad -- Copilot gave multiple incorrect answers. The assistant identified the microphone as a first-generation HyperX QuadCast, then as a Shure SM7b on two other occasions. Copilot couldn't identify the Saturn V rocket from a PowerPoint presentation despite the words "Saturn V" appearing on screen. When asked about a cave image from Microsoft's ad, Copilot gave inconsistent responses.
About a third of the time it provided directions to find the photo in File Explorer. On two occasions it explained how to launch Google Chrome. Four times it offered advice about booking flights to Belize. The cave is Rio Secreto in Playa del Carmen, Mexico. Microsoft spokesperson Blake Manfre said "Copilot Actions on Windows, which can take actions on local files, is not yet available." He described it as "an opt-in experimental feature that will be coming soon to Windows Insiders in Copilot Labs, starting with a narrow set of use cases while we optimize model performance and learn." Copilot cannot toggle basic Windows settings like dark mode. When asked to analyze a benchmark table in Google Sheets, it "constantly misread clear-as-day scores both in the spreadsheet and in the on-page review."
About a third of the time it provided directions to find the photo in File Explorer. On two occasions it explained how to launch Google Chrome. Four times it offered advice about booking flights to Belize. The cave is Rio Secreto in Playa del Carmen, Mexico. Microsoft spokesperson Blake Manfre said "Copilot Actions on Windows, which can take actions on local files, is not yet available." He described it as "an opt-in experimental feature that will be coming soon to Windows Insiders in Copilot Labs, starting with a narrow set of use cases while we optimize model performance and learn." Copilot cannot toggle basic Windows settings like dark mode. When asked to analyze a benchmark table in Google Sheets, it "constantly misread clear-as-day scores both in the spreadsheet and in the on-page review."
Computers don't "feel" anything (Score:2)
In spite of the advances in AI
Re: (Score:2)
Re:Computers don't "feel" anything (Score:4, Informative)
Correct. This is why I don't like the term "hallucinate". AIs don't experience hallucinations, because they don't experience anything. The problem they have would more correctly be called, in psychology terms "confabulation" -- they patch up holes in their knowledge by making up plausible sounding facts.
I have experimented with AI assistance for certain tasks, and find that generative AI absolutely passes the Turing test for short sessions -- if anything it's too good; too fast; too well-informed. But the longer the session goes, the more the illusion of intelligence evaporates.
This is because under the hood, what AI is doing is a bunch of linear algebra. The "model" is a set of matrices, and the "context" is a set of vectors representing your session up to the current point, augmented during each prompt response by results from Internet searches. The problem is, the "context" takes up lots of expensive high performance video RAM, and every user only gets so much of that. When you run out of space for your context, the older stuff drops out of the context. This is why credibility drops the longer a session runs. You start with a nice empty context, and you bring in some internet search results and run them through the model and it all makes sense. When you start throwing out parts of the context, the context turns into inconsistent mush.
Re: (Score:3, Interesting)
if anything it's too good; too fast; too well-informed. But the longer the session goes, the more the illusion of intelligence evaporates.
We associate knowledge with intelligence. Ask a person about their favourite topic, and they will sound smarter.
An LLM knows far more than any human, so we tend to over-estimate their "IQ". The intelligence is still very real, the problem is just that we initially over-estimated it.
This is why credibility drops the longer a session runs. You start with a nice empty context, and you bring in some internet search results and run them through the model and it all makes sense. When you start throwing out parts of the context, the context turns into inconsistent mush.
And how is that any different for humans? I can read them a ten-digit number and their context overflows. Dumb as hammers.
Why do you think a bunch of wet machinery with membranes and chemical messengers is intrinsically supe
Re: (Score:2)
It's different from humans in that human opinions, expertise and intelligence are rooted in their experience. Good or bad, and inconsistent as it is, it is far, far more stable than AI. If you've ever tried to work at a long running task with generative AI, the crash in performance as the context rots is very, very noticeable, and it's intrinsic to the technology. Work with a human long enough, and you will see the faults in his reasoning, sure, but it's just as good or bad as it was at the beginning.
Pathetic [Re:Computers don't "feel" anything] (Score:2)
It's called "pathetic fallacy"-- ascribing feelings (pathos, in Greek) to inanimate objects.
I'm afraid that we do this all the time. I don't even think twice before saying something like "the toaster doesn't like you to run the blender while it's toasting" or "this program wants two special characters in the password, not just one."
Re: (Score:2)
It's called "pathetic fallacy"-- ascribing feelings (pathos, in Greek) to inanimate objects.
The machine spirit must be appeased.
Re: (Score:2)
Well, yes. LLMs do not "feel anything" by design
What is a "feeling"? Is it like a "mood"? In humans, they are short-term states that affect behaviour.
No doubt an AI could be trained using rewards when it modified output in response to praise or insults. It could be trained to get impatient with poorly worded or dumb questions. You could also get such behaviour by modifying the system prompt, but that would be more like humans faking an emotion. What is the difference between a real and fake emotion? Se
Time to rename it (Score:3)
to Drunken Passenger.
"Feel" incompetent? (Score:2)
Seriously, it's totally incompetent.
Apple was right... (Score:3)
Re: (Score:2)
Siri has been one of the weakest assistants for many years now, and given that they usually just ship half finished software (Apple Maps comes to mind), I'm surprised they were able to resist. Maybe there is another reason, like it kills the battery.
Re: (Score:2)
You're implying the end result will be something of quality from Apple. Given the state of Siri I feel like Apple "shipping garbage" would have been a signifikant improvement over their status quo.
Microsoft Distilled to its Essence = Copilot (Score:5, Insightful)
Microsoft desperately wants to sell us a vision of the PC being an "agentic" device. You speak, it responds. Except, they're creating the equivalent of a blind and deaf person being peddled as an expert in all things. It can't read the files on the computer? It can't respond with answers clearly spelled out in the content currently pulled up on the screen? And apparently it can't understand simple questions well enough to even fully grok the scope or domain of the query itself.
Maybe one of the AI pushing tech companies could try to work through the shit-show of pre-alpha state software in their own labs before attempting to foist it off on developers or "insiders" or, more often, the end users? Maybe, just maybe, we'd have a better perspective on AI if we didn't have so much of it shoved in our faces while it's half baked and nowhere near ready to fulfill even the most basic tasks it's being sold as the perfect solution for? But it seems more and more likely that we'll just let the entirety of humanity drown in the refuse pile that half baked AI is creating. Nobody seems at all interested in saying, "How about we get it functional before we shove it out the door?"
Re: (Score:2)
it can't understand simple questions well enough to even fully grok the scope or domain of the query itself.
Not what you mean but it would be funny if one ai just fed its inputs into another and copied and pasted the results so it didn't have to work. Then we'd be approaching human intelligence.
You're preaching to the choir (Score:4, Informative)
Re: You're preaching to the choir (Score:2)
Re:You're preaching to the choir (Score:5, Funny)
Re: (Score:2)
I tried to use a chat tool to write a bash script that would prompt for username and file system, for a quota change.
Was total garbage.
I'm not a coder; just a stupid admin but how are folks using these tools for programming?
Re: You're preaching to the choir (Score:2)
Most of the people on Slashdot have been screaming that the emperor has no clothes for a while now
rsilvergun has been screaming even louder about how AI as we have it now it's already the end of the world, and that society isn't "ready" for it until he says it is.
Funny coincidence, two hours ago I just finished two cavern dives in the very cenote complex in Playa del Carmen TFS alluded to. Some of the best diving I've done yet (comes really close to diving with tiger and bull sharks.) Currently on my second margarita while having a rest Doing another two cavern dives tomorrow.
Re: (Score:1)
rsilvergun has been screaming even louder about how AI as we have it now it's already the end of the world, and that society isn't "ready" for it until he says it is.
Since he's living rent-free in your head, can we assume you're the one responsible for the rsilvergun-impersonating LLM spam?
Re: (Score:2)
Since he's living rent-free in your head
This statement was cute, even funny, the first few times that it was used. That was because it was such an absurd way of making that point.
But, after this statement has been repeated so many times, it's just fucking stupid now. You should consider abandoning it before people start thinking that you are stupid.
Re: You're preaching to the choir (Score:2)
Nah, it is and always was dumb people saying something meaningless just because they learned a new phrase.
https://youtube.com/shorts/odN... [youtube.com]
Re: (Score:1)
This statement was cute, even funny, the first few times that it was used. That was because it was such an absurd way of making that point.
That statement was stupid, even absurd the first times that it was used — by the Reich wing. The entire reason I'm still using it when speaking to them is to rub their noses in how fucking stupid it was.
But, after this statement has been repeated so many times, it's just fucking stupid now.
You're two steps behind me as usual, but at least you're getting there.
You should consider abandoning it before people start thinking that you are stupid.
Insert Travolta looking around meme here. This is me, looking for fucks.
Re: You're preaching to the choir (Score:1)
Insert Travolta looking around meme here. This is me, looking for fucks.
You're not getting them. Contrary to what you believe, rolling around in shit doesn't make you look sexy.
Re: You're preaching to the choir (Score:2)
Thanks for admitting that you're shit.
Re: You're preaching to the choir (Score:2)
Despite how badly as you're lusting after me, no, that wasn't me you were rolling around in, that was rsilvergun's sty, right after he gave you some of his windowpane.
Re: You're preaching to the choir (Score:2)
before people start thinking that you are stupid.
He's already waaay past that point.
Re: (Score:2)
That is probably the most likely outcome. Some tech experts need to retain real skills though or it all comes crashing down. LLMs cannot design run, or maintain tech, despite all claims to the contrary.
Re: (Score:2)
Most of the people on Slashdot have been screaming that the emperor has no clothes for a while now.
Yes. Well, make that "many". But incredible as that sounds given some comments, many people here are wayyyy above average in tech understanding and insight. Obviously, we have the occasional keyword-trigger-only-no-insight MAGA and some tech fanatics, but generally we are insulting each other on a comparatively (if not absolute) pretty high insight level here.
Ads don't tell the truth? (Score:2)
Call me shocked!
Re: (Score:2)
That's one of the reasons ad blockers were created, to block the bullshit.
What, an advertisement for AI lied? (Score:2)
Color me shocked. Shocked!
When I see "agentic" I always read it as "agnatic", which somehow makes it less stupid.
Yet another Windows demo... (Score:2)
Copilot itself is pretty bad (Score:1)
Good at one thing (Score:2)
Windows needed an enema yesterday (Score:5, Funny)
It's just so full of shit. It's a wonder it even runs anymore.
At least the linux marketshare is slowly but steadily increasing, so I approve of the enshittification of windows. No better marketing than what they do themselves.
AI is a roll of the dice. Feeling lucky punk? (Score:2)
Learn from Apple (Score:2)
Steve Jobs would not release a product until it actually did what they claimed it would do. I don't understand why this is some strangely difficult lesson for CEOs to understand. I suppose with the success of Musk and his ilk that idea seems quaint.
Re: (Score:1)
Steve Jobs would not release a product until it actually did what they claimed it would do.
You mean like when he claimed the iPhone would be all webapps?
Let's face it, Jobs' only superpower was being a super dick to employees. This can only take you so far.
Re: (Score:2)
Actually, Apple did deliver that capability but developers pushed-back and didn't want it.
Jobs was indeed a dick, but he did not make advertisements claiming features that do not exist.
Re: (Score:2)
Actually, Apple did deliver that capability but developers pushed-back and didn't want it.
Right. It was a fuckup. And moreover, it was anti-developer and anti-consumer. Yet we're supposed to worship His Holy Turtleneck and address our ills with juice fasts in His name.
Re: (Score:2)
Who in this thread is worshipping Steve Jobs? Who said he has super powers? Do you just have an automated filter that finds any post that mentions Steve Jobs and then starts posting flamebait? Does it have a list of everything he ever did wrong so that you can randomly post a response? There is as much to learn from people who you hate as the people you admire.
This is why nobody can discuss anything rationally on the internet. When someone post something that Joe Biden did right, a troll will inevitabl
i'd feel bad too if i were a microsoft product (Score:1)
while i acknowledge that as a windows user I should feel bad, it's surprising to see their AI feels the same
Conspiracy theory (Score:2)
The AI scam can't go on much longer. LLMs have legitimate uses and possibilities, but nothing to justify the hype.
So... what if Someone is pushing all the AI hysteria for other reasons. They plan to:
1. Completely tank the economy and blame the tech sector;
2. Get lots of nuclear power plants running again;
3. (Hopefully not) use the growing horde of destitute
tech workers to kick off a communist revolution.
Re: (Score:2)
While I agree that LLMs are somewhat useful in a much, much narrower scope than hyped, I am not sure the scam/hype instigators have any agenda besides get-rich-quick. Never attribute to a hidden agenda that which can be nicely attributed to greed. Or something.
A Microsoft product sucks? Such a surprise. (Score:2)
This is really the standard expectation and MS consistently delivers. That is when they do not deliver worse quality.
I do not buy it and doubt others will either (Score:2)
Re: (Score:2)
Re: (Score:2)
probable cause (Score:2)
The failure was probably caused by improvements made to the engine.
The scientist inveils his brilliant creation (Score:2)
(with thanks from Mel Brooks)
https://www.youtube.com/watch?... [youtube.com]