AI Researchers Create Testing Tool To Find Bugs in NLP From Amazon, Google, and Microsoft (venturebeat.com) 10
AI researchers have created a language-model testing tool that discovers major bugs in commercially available cloud AI offerings from Amazon, Google, and Microsoft. Yesterday, a paper detailing the CheckList tool received the Best Paper award from organizers of the Association for Computational Linguistics (ACL) conference. From a report: NLP models today are often evaluated based on how they perform on a series of individual tasks, such as answering questions using benchmark data sets with leaderboards like GLUE. CheckList instead takes a task-agnostic approach, allowing people to create tests that fill in cells in a spreadsheet-like matrix with capabilities (in rows) and test types (in columns), along with visualizations and other resources. Analysis with CheckList found that about one in four sentiment analysis predictions by Amazon's Comprehend change when a random shortened URL or Twitter handle is placed in text, and Google Cloud's Natural Language and Amazon's Comprehend makes mistakes when the names of people or locations are changed in text. "The [sentiment analysis] failure rate is near 100% for all commercial models when the negation comes at the end of the sentence (e.g. 'I thought the plane would be awful, but it wasn't'), or with neutral content between the negation and the sentiment-laden word," the paper reads.
Re: (Score:3, Informative)
WTF is NLP?
Clearly they are talking about Neuro-Linguistic Programming [wikipedia.org].
Re: (Score:2, Funny)
Re: (Score:2)
Re: (Score:2)
Noooooo.... this was a funny, not an informative!
Neuro-Linguistic Programming is 1970s era pseudo-scientific psychobabble.
sigh... :(
Mistakes in NLP are easy to find (Score:2)
For example, when I went to Google translate and translated the above paragraph from English to Spanish and back, I got this: "All you have to do to find a mistake is write a sentence. If that doesn't work, write some. The chances of making a mistake are high."
Recordings of Scottish people (Score:3)
Apparently Alexa and co. can't handle that.
Blort? (Score:2)
Say what now?
Not hard (Score:2)
All anyone has to do to find flaws in their NLP is to use one of their Assistant products for a couple of days.
Don't get me wrong, I am very invested in the Google Assistant ecosystem. I have 5 devices in my home and i use it on my phone a lot. However, while it is still legs and shoulders above the other assistants in terms of smarts, it is still incredibly, incredibly dumb and makes mistakes all the time.
If you want a fun challennge - try to figure out how to ask the Google Assistant or Alexa to play "Pla