Luminoso Word of the Week: Emoji

When they first started appearing, you probably thought they were 💩.  Or perhaps you were reluctant to stop turning your head sideways ;-). But as they continued to supplant emoticons, emojis are here to stay.  And they have an impact on how Artificial Intelligence, Machine Learning, and Natural Language technologies understand the world.

Where did emojis come from?

Since they first started appearing on Japanese phones in the late 90s, the pictographs known as emojis — e (絵, "picture") + moji (文字, "character") — have become a part of the online zeitgeist.

(First emoji set - created by Shigetaka Kurita)

They became more prominent worldwide in 2010, when they were incorporated into Unicode standards and became available on the keyboards of popular mobile devices.  In 2013, they appeared as one out of every 600 characters on Twitter, and half as often as the # hashtag symbol (Yes, on Twitter!) A 2016 Emoji Report also claimed over 92% of the world’s population use emojis, sent across 2.3 trillion mobile messages annually.  Whether being named as Oxford’s Word of the Year, or inspiring one of the worst movies of 2017, emojis are everywhere.

How does AI treat emojis?

For natural language technologies, however, the emoji has often flown under the radar.  Assuming they’re not skipped over entirely, many natural language tools often treat them as a new, unknown symbol - “I ❤️ you” could just as easily be “I ♣️ you” or “I 👀 you”. Hopefully, provided with enough examples, those systems become acclimated to the symbol’s meaning. Eventually, with massive amounts of data, they may even recognize that ❤️, 👩‍❤️‍👨, and 💖 have similar meanings, and 💔is an antonym.

And that’s really too bad, because each emoji effectively functions as a Rosetta stone. They aren’t generic symbols; they each have an associated meaning.  With a few exceptions, they can act as a point of semantic agreement across more than 70 languages. With customer communications increasingly coming from mobile devices, why wouldn’t companies analyzing their customer feedback want to recognize emojis?

What does Luminoso do with emojis?

Luminoso’s common sense natural language technology is based on the idea that machine learning algorithms work faster and more accurately if they’re given knowledge about the world before they start adapting their language models.  Unlike naive natural language technologies, which must learn about the world anew each time they analyze text, Luminoso starts off with over 28 million facts from the knowledge built into the ConceptNet semantic network.

Recently, ConceptNet got even smarter. The latest version of ConceptNet, released in April, knows what emojis are and can define them in a number of languages, thanks to imported Unicode data. Now, the Luminoso engine starts off with the common sense to know that “soccer” is not only played with a ball, the main sport in Mexico, and spelled 축구 in Korean… but also that it can be represented by an emoji. 

And that gives our clients faces with tears of joy. 😂😂😂