Natural language can be such an ass headache
It was exciting to see Luminoso's new product for streaming natural language understanding, Compass, get an article in Wired. Skimming past the picture of Catherine and me looking ridiculous at SXSW long ago, there's an image of our "concept cloud" visualizer looking at what people say on Twitter when they're sick:
Wait a minute. Zoom in. Enhance.
The article includes a screenshot that includes a natural-language glitch that's already caused a lot of amusement around the office.
Here's what's going on. One important thing that Luminoso does is to identify relevant phrases that contain more information than the sum of their parts. When looking at text from people who are feeling sick, the phrase "throat hurts so bad" is much more informative than the words "throat", "hurts", "so", and "bad" in isolation.
Usually, these informative phrases end up being reasonable phrases of natural language, or at least close enough ("headache is killing" is missing the object, but we all get the idea).
One case where this missed slightly is the phrase "ass headache". This is not an affliction that people would usually complain of. And yet it looks entirely reasonable to the computer, given the source data, which contains many phrases such as:
- "I got this crazy ass headache"
- "I have a biggg ass headache"
- "I gotta mean ass headache bruh"
Statistically, it looks like an "ass headache" is a thing you can have. You can have a crazy one, or a mean one, or simply a biggg one, but lots of people have one.
Because we're actual speakers of the language, as opposed to computers stumbling through it to the best of their ability, we know how these phrases should really be interpreted. We understand that the word "ass", for whatever reason, can be a modifier for the adjective before it. (That doesn't stop us from humorously reinterpreting it as a modifier for the noun after it, as an early XKCD comic encourages us to, which is essentially what Luminoso's analytics did!)
XKCD #37, by Randall Munroe
Phrases that come up in our everyday conversation can contain surprising grammatical quirks. And that's why natural language is such an ass headache.