Vista Speech Recognition Goes Awry 418
An anonymous reader writes "It seems even MSNBC is willing to take a jab on those rare occasions when Microsoft products don't work. During a demo of Vista's speech recognition technology, Vista couldn't differentiate between mom and aunt, and all attempts to rectify the problem just made it worse. Wait until you see what it spat out, I think we have a new 'All your base.' Don't you just love Microsoft's live demonstrations?"
Roald Dahl (Score:5, Funny)
Reminds me of the Roald Dahl short story about the ant-eater who ate someone's aunt because their accent rendered the two words the same.
I can't remember what the story was called.
Re:Roald Dahl (Score:5, Informative)
Re:Roald Dahl (Score:3, Funny)
Re:Roald Dahl (Score:3, Funny)
Awww...c'mon guys.... (Score:5, Funny)
I mean, it's not like they have a reputation for releasing half-assed code that's been hyped up through marketing to the point that it will never perform as advertised.
And it's not like this is a company that is having image problems due to its monopolistic nature.
Or headed by an infamous ragaholic with a history of intolerance towards free standards.
Nope, I'm sure that this is just an accident by a company that spends its off hours petting little baby chickens and bunnies.
Re:Awww...c'mon guys.... (Score:5, Insightful)
This was really a dreadful presentation. There was no ambient noise (as the commentators say later, and despite what Microsoft says), and there was no echo as the demonstrator claims during the actual test. It seems to have been done under really good test conditions, but still it failed miserably.
Re:Awww...c'mon guys.... (Score:5, Insightful)
Voice recognition requires some training regardless of who provides it. We're not Star Trek here....Prep work and rehearsal people. If mr. sales guy had tried the demo before the presentation he would have noticed it wasn't working and avoided the embarassment.
This is why sales people are asshats. They're unprofessional non-technical people who sap back the high life while the rest of us have to put up with the mess they create through their daily barrage of verbal diarhea.
Tom
Mr. Pogue begs to differ: (Score:3, Interesting)
Reason 1: You don't have to train this software. That's when you have to read aloud a canned piece of prose that it displays on the screen -- a standard ritual that has begun the speech-recognition adventure for thousands of people.
I can remember, in the early days, having to read 45 minu
Re:Awww...c'mon guys.... (Score:4, Interesting)
A far more likely scenario, in my mind, is that he trained and tested it 100 times and got it working nearly flawlessly, but in a different room and with a different setup. In fact he may have overtrained it. Programs like this can behave very badly when they end up overfitting the data.
On the day in question he may have had a different mic and the acoustics were certainly different and the program went whacko.
Re:Awww...c'mon guys.... (Score:4, Insightful)
I seriously doubt this presentation was rehearsed. At the very least, they should have tested it in that room with that mic, etc. But in all honesty, this is going to be used by millions of people in all sorts of rooms with all sorts of mics. That shouldn't matter anyways.
Anyways, I doubt he prepared at all, that is, other than snorting cocaine off a mirror in the back room before the show.
Tom
Re:Awww...c'mon guys.... (Score:5, Funny)
Voice recognition requires some training regardles (Score:2, Interesting)
Dictation Software Improves Usability, Accuracy [npr.org]
Oh Please (Score:2)
Re:Oh Please (Score:5, Insightful)
Point is, if the sales guy had tried the system out beforehand he would have noticed it not working.
That is, suppose the code is total shit [I know, big stretch for MSFT]. Then isn't it likely it would have failed during the preparation stage? If you are saying "mom" and it always comes back "aunt" you may want to cancel the presentation.
That's why I think he didn't do any prep work for the presentation.
Tom
Re:Oh Please (Score:3, Insightful)
Or, in the very least, don't say 'mom'! I've had plenty of times where a salesperson tests something the day before a demo (usually after a week of knowing he had a demo, but that's a different rant) and finding something. Our usual response with that short of notice is 'well, don't show that' since we didn't have enough time to fix it. At best, we could send a version that had the error suppressed but not
Re:Oh Please (Score:3, Funny)
Who knows how the algorithm they implemented works.
Probably nobody at Microsoft. . .
Since when did Mum sound anything like Aunt? (Score:5, Funny)
Since when did Mum sound anything like Aunt?
Re:Since when did Mum sound anything like Aunt? (Score:3, Funny)
Re:Awww...c'mon guys.... (Score:5, Insightful)
For instance, the word "patent" is pronounced differently in the UK from North America. In the UK it is "pay-tent" and over here it's "pah-tent". That's just one example.
Point is [to paraphrase ballmer]:
Preperation (clap), preperation (clap), preperation (clap), preperation (clap), preperation (clap), [pitch of voice higher], preperation (clap), preperation (clap), [wheeze out of breath, pitch even higher], preperation (clap), preperation (clap), yeah!!!
Something tells me this sales guy will get neither punished nor lose their x-mas bonus. Some poor schmuck in engineering will take the fall for not making the demo "people ready".
Tom
Re:Awww...c'mon guys.... (Score:5, Funny)
One would be inclined to think that since you went and typed that word nine times, you would have managed to spell preparation correctly at least once...
Re:Awww...c'mon guys.... (Score:5, Funny)
[and I copy/pasted it. Yeah I know, I'm hardly literate. What you wanna fight about it?]
Tom
Re:Awww...c'mon guys.... (Score:5, Insightful)
Aw, c'mon; how many English dialects pronounce "mom" and "aunt" similarly?
Even to someone who's worked with voice recognition, that mistake simply isn't credible. If the software were anywhere near usable, it wouldn't confuse those words from anyone, especially not in a low-noise, no-echo demo.
This is a "No excuses" situation. That demo was simply a dismal failure due to some major bug(s).
Of course, the speech recognition field has a long history of staying in such a state forever. It's hard to find a product that, even with extensive training, doesn't produce howlers like this.
I did like the "killer" part
Re:Awww...c'mon guys.... (Score:5, Insightful)
Chances are he never even did a walk through of the presentation before the press was there.
Tom
Re:Awww...c'mon guys.... (Score:3, Funny)
Re:Awww...c'mon guys.... (Score:4, Interesting)
Try this:
Said: "How to recognize speech"
Understood: "How to wreck a nice beach"
No, it's not always easy to tell the difference...
Re:Awww...c'mon guys.... (Score:3, Funny)
Re:Awww...c'mon guys.... (Score:3, Informative)
Not the first MS demo embarrassment. (Score:4, Informative)
There, I found it. The file is an old QuickTime movie. I'm going to put this up on YouTube. There, that's done. Have at it [youtube.com].
Re:Awww...c'mon guys.... (Score:3, Interesting)
I'm not one to defend MS, but I speculate that the volume on his microphone was set too high, causing distortion and clipping. Look at the volume meter when he talks -- it goes all the way to the top.
Hee hee (Score:5, Funny)
Re:Hee hee (Score:3, Interesting)
Voice recognition means permanent beta. Voice recognition only slightly improved during the last ten years. One reason is that the VR market it a trivial patent minefield. The rest is just performance.
Sure, we will get proper voice recognition some day. I would source it out to open source and integrate it back into my products once it will be ready.
Re:Hee hee (Score:3, Insightful)
Well (Score:4, Funny)
Re:Well (Score:5, Funny)
The Voice of Experience (Score:5, Insightful)
Experience is the human quality that enables you to recognize a mistake immediately when you make it again.
Dacap
So? (Score:5, Informative)
Win98 gone wild: http://www.youtube.com/watch?v=Hrbx9_AY720 [youtube.com]
Media Center Edition gone wild http://www.youtube.com/watch?v=j7EEbokKLHI [youtube.com]
We can add this one to the list too
Re:So? (Score:2)
Re:So? (Score:2)
Tom
Dear aunt (Score:5, Informative)
Final text:
Re:Dear aunt (Score:5, Informative)
http://blogs.msdn.com/robch/archive/2006/07/29/68
This sounds so much like Microsoft (Score:4, Insightful)
Yes bugs happen, yes vista is still in beta but rather then just admit "vista is still a buggy piece of crap software that can't even be used properly by its own engineers" they tell us to sit and wait because we can trust them to fix it.
To MS credit, it is a strategy that works.
Re:Dear aunt (Score:3, Funny)
Badabing! Thanks folks, I'll be here all week.
Remember the Win98 BSOD? (Score:5, Funny)
http://www.ntk.net/media/developers.mpg [ntk.net]
Just from MicroSoft Insider (Score:4, Funny)
"Sir put down the chair, then we'll talk"
"No Steve wait up, don't do that"
"BOOM CRASH BOOM CRASH BOOM CRAASH WAAAH NOOO STOOOOP"
"DUDE, THE COMP HAS A BSOD! WAAH!"
A little early for voice recognition (Score:2)
Yes it works in some contexts, especially if it's been trained with the person speaking, and the language is limited, such as in a professional environment.
but for home computers, it's not only overkill it's also inadequate and non-functional.
I say COOL feature, but hopeless waste of time and money, which in the end will be paid by you-know-who (not ms)
on another topic can someone please ask ms to stop the increasing and
It's hard (Score:4, Funny)
Come on Microsoft... (Score:2, Redundant)
I don't think so. (Score:2)
This demo didn't just drop a couple of words, or misinterpret an ambiguous sounding phrase, this was a complete melt down. A more plausible explanation is that the guy's voice was also amplified through the PA system in the room, and the computer's microphone was
Vista couldn't differentiate between mom and aunt (Score:3, Funny)
Sequence of events (Score:2, Funny)
Dear aunt,
"Fix aunt"
Dear aunt, let's set
"Delete that"
Dear aunt, let's set
"Delete that"
Dear aunt, let's set
"Delete that"
Dear aunt, let's set so
"I think it's picking up a little bit of echo here...delete - select all"
Dear aunt, let's set so double the killer delete select all
*Manually selects all and deletes*
"Okay, I'm glad you're enjoying this"
*Laughter*
Re:Sequence of events (Score:5, Funny)
Man, that brings back memories!!! (Score:2, Informative)
Re:Man, that brings back memories!!! (Score:4, Funny)
Re:Man, that brings back memories!!! (Score:2)
I took me a lot of time to train it to somewhat understand me (I'm not a native english speaker, and my PC was not exactly fast)
So, almost a decade later, MS stuff is no better.
OS/2 Still Kicking Microsoft's Ass (Score:5, Funny)
Speech recognition is still just a gimmick anyway. We still have a LONG way to go before it gets to the point that Joe Average User imagines it should be. Joe average user wants his computer to respond like the one in Star Trek. I still want to set up my Asterisk server with speech recognition, though, so that people can either dial or say the extension they want. It'd also be neat to pick up the phone, say "Call Mom" to the dial tone and have it call my aunt for me.
Re:OS/2 Still Kicking Microsoft's Ass (Score:3, Funny)
Seems Microsoft's speech recognicion is just right for you
removing ambient noise (Score:5, Insightful)
why not just use two mics, one to record the ambient noise (positioned away from the voice mic) the other to record the voice (headset) then as you have two signals just subtract the ambient noise signal from the heaset signal , voila clean headset mic audio
works for music too, you could control your music player by voice even when its playing loud (at a party) by removing the music signal from the mic signal
-AJS
Re:removing ambient noise (Score:5, Informative)
What is done in practice and works extremely good, is modelling that "echo" as a filter (a FIR transversal filter, which is simply a delay line). You estimate the coefficients of the filter and use the music signal after the "room filter" has been applied to substract from the microphone signal. You then have the voice-only signal left.
This is setup is called AEC or Acoustic Noise Cancellation. It is used in every telephone and mobile phone there is and is crucial to ADSL. If an ADSL modem would not cancel out its own sent signal at its receiver, the attainable speed would be several times less. AEC is also the reason why talking immediately when you pick up a mobile phone leaves an audible echo of your own voice: estimating the coefficients of the filter is still taking place at that point.
See http://www.dspalgorithms.com/products/echo.html [dspalgorithms.com] for a diagram of the AEC or read Haykin's Adaptive Filter Theory if you're looking for a decent book on the subject.
Wait . . did they say (Score:2)
Microsoft Innovation (Score:3, Insightful)
Loved the End of the Video (Score:2)
"Live television is rough. Welcome to our world." she said. Ooooooh. Nice kick below the belt. Sounds like they're not keen on Microsoft at
On MSNBC's front page - for about 30 minutes.... (Score:5, Informative)
I went to msnbc.com - and there it was, third on the list of videos on the main page.
I called this to the attention of two of my coworkers, and we viewed the video - total elapsed time, maybe twenty minutes.
Then I went to call it to the attention of a third coworker - and the video was no longer on the front page of MSNBC. OK, so maybe they've moved it off the front page, but it should still be on the Technology subsection, right?
Wrong.
Nor was it under Videos, nor anywhere else I could find it easily.
Perhaps this was just a normal rotation of a video. Perhaps not. But no matter what the real cause, there is the appearance that it was removed from the page because it was too embarrassing. Not good for Microsoft.
However, I will give MSNBC this - they didn't give Microsoft a free ride on this, they ribbed them pretty hard.
However, I knew that this would be appearing on other sources as a video that could be viewed outside of Windows. Actually, I am rather surprised that it took this long.
Now, as to the demonstration itself - it looks to me (a person who does signal processing and analysis for a living) like the presenter had the mike gain too high - every time he spoke he maxed out the bar graph on the display. *IF* he had the gain too high, and the audio was clipping significantly, that could make "mom" have enough of a pop to maybe sound like AUNT - especially if the software is using context to try to reduce the search-space for the words. Of course, that's why I would have a monitoring routine in the system, and if any of the samples are at 100% full scale, or if many of the samples are over 90% full scale, or the signal power is too high, I'd have my software adjust the mike gain down *and* flag an alert to the user. I'd also try to look for the mike element itself being overloaded.
That's REALLY sad... (Score:2)
Re:That's REALLY sad... (Score:4, Interesting)
Audio Gain Settings Caused the Problem (Score:3, Informative)
According to Rob Chambers [msdn.com], a developer on the Vista speech recognition team, the failures during the demo were caused by audio gain issues.
From his blog:
Read the entire blog post for a more complete explanation of what happened... one that's just slightly more plausible than most of the explanations proffer by your fellow Slashdotters.
Comment removed (Score:3, Informative)
I've also noticed this as a beta tester (Score:3, Funny)
Re:Is SR ever going to be good enough? (Score:5, Informative)
One thing it did do which was good though is tried to understand sections of speech, rather than just each word, which did improve accuracy. Words often follow patters and there are few words that make sense after a word, so it was often right with "over there".
SR tech will eventually be as good as on star trek as long as people work on it. I would give it 20 years if it is seen as something which could make a lot of money, 40 if you have to wait for interested people to do it for free on their own time
20 years times the number of steps (Score:2)
I would give it 20 years times the number of steps, with waiting at each step for the patents to run out, if Elektroschock is correct in his comment that it is a patent minefield [slashdot.org].
Re:Is SR ever going to be good enough? (Score:2)
90% accuracy is nowhere near enough for voice recognition in a dictation context. 90% accuracy means one word in every ten will be wrong. In this post, so far, that works out to more than one erroneous word per sentence. Factoring in the time taken to correct all of these, it is much slower than typing; especially if you need to interrupt flow to do the corrections.
Re:Is SR ever going to be good enough? (Score:3, Interesting)
Think "computer lights", if it gets it wrong you just try again. All those media PC would be good candidates as well. If I say "change to channel six" and thing swiches to sixty 1/10th of the time well I could repeat myself that often in that application anyway; and still be pretty satisfied.
Re:Is SR ever going to be good enough? (Score:3, Insightful)
Depends on your own context...I deployed (admittedly, an older version) of Dragon NaturallySpeaking in an office full of mobility-impaired employees. They found it much easier to spend 10% of their writing time fixing errors than 100% of it trying to, for example, type with the onscreen keyboard. If you can't use a keyboard, even crappy voice recognition is a godsend.
Re:Is SR ever going to be good enough? -- Yes! (Score:5, Informative)
I didn't believe it either, until I actually tried it. Dragon is the first worthwhile speech recognition solution I've seen that's practical for general use (Though I'd love if they'd release a "programmers" version to compliment the Medial/Legal versions). I get about 99% accuracy (a decent microphone is *very* important!)
Dragon 9 also doesn't "technically" need training, but accuracy further improves if you do bother to train it a bit. The NYT reviewer was able to get 99.6% accuracy after a short training session.
Here's a few reviews of version 9:
http://www.nytimes.com/2006/07/20/technology/20po
http://www.npr.org/templates/story/story.php?stor
Re:Is SR ever going to be good enough? (Score:2, Interesting)
Well, I've taken a look at that (a while back): Dragon seems to be the leader, they get (with a month of traning) te best accurracy.
However, sound recognition engineers are slowly realizing that the problem of recognising words is not just the algorithm's fault. Even people arn't able to understand all words from a taped conversation in a cafeteria.
Dragon is currently the best, getting further will probably require more input, like a webcam to read your lips. This is just another Microsoft product whe
Re:Is SR ever going to be good enough? (Score:4, Insightful)
Re:Is SR ever going to be good enough? (Score:2)
The problem though, this is YET ANOTHER THING that should not be bundled with Windows. It should be sold as a third-party add-on. Even if MSFT produces it, it shouldn't be part of Windows unless a customer CHOOSES to buy and use it.
MSFT obviously has learned shit all from their anti-trust LOSS, and to them they can do whatever the hell they want.
Frankly, I look forward to the day where MSFT is just a footnote in a history
Accessibility? (Score:2)
Should a driver for the keyboard be bundled? To somebody who does not have use of hands, speech recognition is as indispensable as a keyboard driver. This is important when trying to get your product certified as disability-safe for use by agencies of governments.
Re:Accessibility? (Score:2)
However, there IS competition in the voice recognition market. Microsoft bundling their stuff in Vista is an anti-trust violation just like bundling IE with Win98. Your government example is also moot. The IT people could install A DIFFERENT CERTIFIED PROGRAM on their Vista images. I seriously doubt the IT people will be the ones nee
Re:Is SR ever going to be good enough? (Score:2)
-uso.
Re:Is SR ever going to be good enough? (Score:2)
Linux distributions are even more aggressive in the application bundling (though there's a conspicuous lack of decent voice recognition).
Re:Is SR ever going to be good enough? (Score:2)
Linux distributions are free to install and the software contained in them is portable. I can use Gnome in the BSD or Linux distributions with free will. I'm not stuck to using OpenOffice [for instance] ONLY in FreeBSD. Furthermore, with the good distributions I can remove and change competing programs. Don't l
Re:Is SR ever going to be good enough? (Score:4, Interesting)
They don't deserve credit for starting the "PC revolution". The credit properly belongs to the hundreds of little startups and hobbyists, the whole CP/M crowd and others like Amiga. Microsoft was a subcontractor to a giant monopoly (IBM) that stepped in after the little guys demoed there was a market, and took over that market. They succeeded mostly because of a marketing budget greater than the budgets of all the little companies combined.
And there's a good argument that, by marketing PC/DOS rather than CP/M, they set back the PC revolution by 5 to 10 years, the time it took for PC/DOS to match the capabilities of CP/M when IBM started their PC marketing campaign.
Sorry; that's the way "the Market" works in the computer field. Small, independent developers make something new and start selling it; the big companies then step in and take over the market through traditional monopoly strategies.
It's likely that we're now going to hear people crediting Microsoft for starting the "voice recognition" revolution by inventing the new idea that computers can understand speech. Marketing can redefine history like that.
(Whereas we computer geeks know that Al Gore invented speech recognition.
Re:Is SR ever going to be good enough? (Score:5, Insightful)
For example, how does the computer know that Picard wants to call Riker and isn't just talking about him? Oh and keep in mind the computer never misinterpreted something. In other examples, people would carry on intelligent conversations with the computer - all those holodeck scenes, Troi ordering chocolate, etc.
Star Trek-style of SR I think would be the holy grail and is probably always going to be out of reach. Barring some amazing breakthrough in AI algorithms, the computer power required just for the situations above would be incredible - and that's computer time that probably could be put to better use elsewhere, even if it was found to be possible.
I think the computer in the original Star Trek was more realistic - but even there the voice-recognition was far beyond what we're capable of today, as Microsoft has demonstrated so well. Plus all the blinkenlights that seemed to have no useful purpose were cool.
Re:Is SR ever going to be good enough? (Score:2)
Re:Is SR ever going to be good enough? (Score:2)
Re:Is SR ever going to be good enough? (Score:2)
If anything, Star Trek is a massive under estimate as to our technological prowess in 300 year. In 300 years I do not expect humans to be running around
Re:Is SR ever going to be good enough? (Score:4, Interesting)
Well, maybe. But we invented microscopes around 300 years ago, and discovered microorganisms immediately thereafter. The understanding that some bacteria were involved in diseases followed quickly. But it was nearly 300 years before we successfully eradicated a disease (smallpox). Today, we're still battling new diseases, and we don't have anything like a general solution to all diseases. We have a few antibiotics that effect more than one disease, but we haven't made much progress in solving the problem of the development of resistance to our antibiotics. Hell, we can't even convince the general public that it's the evolutionary process at work here, and we've understood that for around 150 years.
I wouldn't predict any general solution to a complex problem like voice recognition in a mere 300 years. Maybe we will. But our history of general solutions to other complex biological problems is not encouraging. Neither is the history of our first 50 years of AI, despite the constant hype and Hollywood movies claiming that AI is just around the corner.
Re:Is SR ever going to be good enough? (Score:5, Funny)
Hmmm - holodeck, Troi, chocolate.....the combination of those three items is something that gives one pause to ponder.
Um, I'll be back in a little bit.
Re:Is SR ever going to be good enough? (Score:3, Insightful)
The fleet's computers have "known" Picard since he entered the service. They should be pretty well trained .
The communicator badges in TNG could be transmitting supplementary biometric data and n
Re:Is SR ever going to be good enough? (Score:3, Insightful)
The badge also indicates the location of the person. So if Picard says "Will" (or "number one", which is simply an alias that Picard made for "Riker, William T.") and the computer sees that Will isn't in the same room as Picard (or isn't within normal hearing distance), it simply connects the two via a communication channel.
Re:Is SR ever going to be good enough? (Score:3, Insightful)
Probably. But it will have to get much better at using context. They're already using grammar as a cue, but it's going to take much more than that. Humans draw on memories of previous conversations, knowlege about the interests and mannerisms of the person speaking, and knowlege of the situation at hand. Even just knowing what's big in the news can help.
As for ambient noise, there's often
Re:are u serious? (Score:4, Insightful)
It's called modesty. If MSFT had any [and some humility] they wouldn't get laughed at so hard for this. I mean look at Linux. Find a bug in the Kernel, fix it, post notices that its. You don't see anyone saying "Oh hahaha, Linus is at it again!" That's because you also don't see Linus on CNN mocking the rest of the world.
Microsoft deserves all the negative press and humilitation they get because they are shameless, deceitful, greedy monopolistic bastards.
Tom
Re:are u serious? (Score:2, Insightful)
The reason I find this eminently amusing is that Microsoft is a company built on marketing. At no particular point has Microsoft had "The Superior Technical Solution"; they have always had luck and better marketing. Since DOS 3.3 there have frequently been products that were more stable, faster, easier to use - yo
Re:are u serious? (Score:5, Insightful)
Hmmm, no. Maybe it's the way they deal with failures. Remember Bill gates trying hard to demonstrate the Media Center [google.com]? Some time after that Steve Jobs gave his regular Macworld keynote when his Mac didn't respond anymore. He moved a monitor switch to continue the presentation on another Mac and said: "Well, that's why we have backup systems here."
Re:Why it sucks .... (Score:2)
Nonsense.
Just in case you have been living under a rock for the last 15 years I have some news for you:
That kind of innovation doesn't come out of the microsoft campus. It comes out of universities.
And those couldn't care
Re:Why it sucks .... (Score:2)
Re:Why it sucks .... (Score:2)
Name one, beauty [spiked3.com].
Re:Why it sucks .... (Score:2)
BonziBuddy!
Thanks again Bill.
Re:I agree (Score:2)
I see you're also using the trial version
Which Vista promptly translates as: (Score:2)