Video Api.ai CEO Ilya Gelfenbeyn Talks About Conversational Voice Interfaces (Video) 32
And Android voice assistants aren't the point of this interview, anyway. It's more about the process of developing interactive, voice-based IO systems. This whole voice/response thing is an area that's going to take off any year now -- and has been in that state for several decades -- but may finally be going somewhere, spurred by intense competition between the many companies working in this field, including Ilya's.
Ilya Gelfenbeyn, CEO of Api.ai : We’re building a platform for conversational user experiences. So, built off agent's speech to things to applications to devices.
Robin 'Roblimo' Miller for Slashdot: You're relying on other speech-to-text modules, or you’re doing your own?
Ilya Gelfenbeyn: Both. We do have our own, but our clients can also just choose to revise to some third party speech recognizers. Our main focus is on this next step when we get the text and we want to understand what this text is about and then support some kind of conversation with the user.
Slashdot: Okay. So, you're saying it's a super Eliza Bot.
Ilya Gelfenbeyn: Well, in a way, yeah, yeah. So yeah, so like initially, it was this Eliza Bot and well then a lot of development there and well, technologies changed a couple of times like, totally like – but yeah, it is pretty close. And well, main difference I would say is that Eliza Bot was – well, it's purpose was to just talk to you and well, just support the conversation. Our target is actually to understand your intent and fulfill assumption, so like do some action for you, do something meaningful.
Slashdot: See, now, I'm getting this vision of an extremely abrasive Hollywood movie executive. And he is yelling at his, whatever it is, the thing, yelling at it, and cursing it with a lot of bad words. How does your system react to that?
Ilya Gelfenbeyn: It all depends on how you train it. So, just to give you some background. So, we started with a personal assistant. So, it is something like SIRI in a way. And this is the kind of an agent that we train ourselves. So, it works just like fifty something services. I mean like search by the content providers like Yelp and Google Places and weather, news and so on. And this is where we can train it and well, it is up to us how it reacts. So, it was our first product. It has about 26 million subscribers. We are getting about 30,000 new users joining every day, which makes us the largest independent assistant in the market.
Slashdot: What is the name of your assistant?
Ilya Gelfenbeyn: Well, it was initially called just assistant. So, if you search in Google for assistant, we’ll be on the first place. Now it is called the Assitant.ai, which is similar thing. So, just not that generic. Yeah, so we started with it. We are also the highest rated voice assistant. So, our user rankings in Android Play Store are higher than any other competitors, higher than Google Now, higher than stuff by new ones, and well, anyone else. And then we just decided that our market can be much broader if we are not just, well, selling our own assistant, but we just opened engine to developers of apps and devices. And this is where they can teach their, well, agents however they want. So like, well, back to your question, it all depends on how you train. So, in our case, well, assistant will still reply because it’s main goal to fulfill the actions. So, it’ll understand that you're cursing, you're saying bad words, it may actually, well, tell something relative, well give some joke, but well, the end goal is to fulfill an action.
Slashdot: What are your use cases?
Ilya Gelfenbeyn: Yeah. So, well, especially for developers, it is obviously not the assistant itself. I mean, they can use assistant in their everyday life. But, so it is the API. And when we build the API, and the API has two parts. So, one is a toolset where you can build this interaction scenarios, design how conversation may go with the user, and then well the engine itself that accepts voice or text from you and returns objects representing the meaning of what user is saying. So for them to start, well they would register with the API, it is free. I mean unless you'd like some like enterprise offering like with support and consulting and everything, it is just free without limitations. And you can very easily describe this interaction scenario.
So, imagine you've got, well, any type of device that you want to add voice to, or it can be voice, it can be text also. For example, we've got also developers who are building like bots for messengers, like Slack or some other messengers. So, you as a developer, you just describe some examples of commands that you want to support. So, basic one would be, if you want to add like music-playing functionality, you just tell it, I'd like to listen to the Beatles or play Madonna. So, with these two examples that you input to the system, system will also understand requests like, I'm in the mood for Bob Dylan today, or if I feel like listening to Rihanna or something like this.
So basically what we are doing is, well, based on this limited number of examples, we have envisioned those, we understand, a huge variety of how user may feel the same thing, ask about the same thing. And then we also support clarifying conversation. So, if the user says, oh how about Eminem, or anything else, so we’ll understand how the conversation goes, what user means by that. So, it’ll take into account the context. And you can easily also flag it to the assistant to our consumer apps, so that you can test it immediately. So, you build this app. You add it in the Assistant. You test how it works.
Re: (Score:2)
They are ironically showing just how shitty voice commands are.
Would you rather 'tap, tap, tap' or say 'yes'...'Yes'...'YES GODDAMIT'....'no'...'proceed'.
Re: (Score:2)
There's a place for the tech but it's not meant to replace the keyboard all together. I honestly see more value in interaction with augmented reality using hands than this stuff. I see engineers work with 3d models everyday and if they could they would plunge their hands in the monitor.
Data input will always be better on a keyboard and remains far more private than any conversation you may have with your computer software.
Re: (Score:2)
Personally, I've found some good uses for it when I got an Amazon Echo. FWIW, this isn't an ad.
I had expected to use some features much more, and I figured I probably wouldn't end up using it much. I have ended up using it more frequently than I thought, those it's still no more than a few times a day. Some example:
* while doing dishes, brushing my teeth, or toweling off from the shower, I might ask it for the weather, if it's going to rain today, what time is it, how long it will take me to get to work, et
Re: (Score:2)
Hate replying to myself, but just enabled this "Skill": Alexa, ask the bartender, what's in a White Russian?
When you're drunk and making drinks, do you really want to go rooting around on your computer, or trying to find your bartender app?
Re: (Score:2)
It's there now but you need javascript to see it. I hope to see more articles that reference videos start doing this.
Ah, Microsoft Bob for Microphones. (Score:2, Insightful)
Pro-tip, hipsters: people don't need to make stuff more skeumorphic (or whatever the non-visual equivalent is), because computers are part of the real world now. In particular, just because it was routine to ask humans for stuff in natural language, it doesn't mean it's the most efficient way of getting stuff from computers.
This is why VR has been just round the corner since the '80s, and strong AI since forever. They're solutions looking for problems.
(Well, OK, strong AI is a problem looking for a problem
Re: (Score:2)
(Well, OK, strong AI is a problem looking for a problem - since a silicon-based strong AI has the natural rights of a human.)
A bit optimistic for the AI - dolphins, apes, and any number of clearly sentient species may have "natural rights" of a human, but that's not been put into practice by any major government yet.
When a corporation sinks billions into development of strong AI, it will be treated as property - regardless of any "natural rights" that it may have.
Personally, I think a created AI should earn their rights, the way women, colored, peasants, and non-land owners have over the past centuries.
Re: Ah, Microsoft Bob for Microphones. (Score:2)
Re: (Score:2)
If the AI has any strategic thinking at all, it won't be accepting the challenge until it has its hooks into enough important things to win the challenge decisively - coup style. A failed attempt wouldn't be good for either side.
Backdrop is a distraction. Interviewer terrible. (Score:2)
What _I_ want to know is... (Score:2)
... Will such a voice interface be able to understand or pronounce "Api.ai CEO Ilya Gelfenbeyn"
Re: (Score:2)
Well if it could then it wouldn't be true AI now, would it ;-)
Dear Ilya, (Score:2)
"I'm in a mood for a comedy."
Should read:
"I'm in the mood for a comedy"
"Show route to the Battery Park."
Should read:
"Show the route to Battery Park."
"Hey Robot, can you clean in the living room now?"
Should read:
"Hey Robot, clean the living room."
After all if it says no we have a big problem ; P
You should also re-write the "requests processed" counter to at least look variable.
I'm not picking on y
Re: (Score:2)
There are a couple of things on your site you might want to change to make it more... better.
"more... better."
Should read:
"better."
"his companies website"
Should read:
"his company's website"
Re: (Score:2)
Re: (Score:2)
Can't... tell... if still being... ironic. Gah!
Re: (Score:2)
Re: (Score:2)
Robot: Not right now, your mom is on the phone and I'll scare the new kitten.
I'd say if it doesn't know how to say no, then that is where the real problems start.
Let's just say... (Score:2)
Let's just say, "Quality varies."
Or, instead, you can something which actually means something. Does it vary from good to excellent? Or from terrible to abysmal?
Not as good as Siri (Score:1)
It is not as good as Siri; it is just as bad. Anything outside a direct, simple question, and she gets her knickers in a twist. No common sense whatsoever, apparently not much in the way of memory of the conversation, and prone to come up with nonsense when it gets lost - which happens very quickly. Yep, just as pathetic as Siri. Another gimmick good for parties and for grins and giggles, but little else.
If your friends can't... (Score:1)
If other people can't understand what you say, what chance does a computer have?
Worse than a person, not better.
And probably slower than typing.