Conducting analyses in a changing science landscape
This week's blog post was written by Cassie Johnson, Vice President of Customer Success and Services at Luminoso.
My colleague Dmitry Grenader recently wrote an article about AI and machine learning and our relationship with it. Specifically, he ended his post exhorting his readers to:
Work with the Machine. Seriously, tell your colleagues to treat it as another form of intelligence, and augment it with human intelligence, common sense, and the knowledge or context of your specific business.
That’s exactly what we’ll expand upon in this post. How do we work with the machine? How do we artfully combine our own knowledge with that of the machine’s? Let’s be clear at the outset--we’re talking about analyzing written language using Luminoso Analytics. More specifically, we are talking about how to deal with a shifting scientific landscape when it comes to the the technology that is the backbone of Luminoso Science. The fact is that the science is getting better everyday, and periodically we update our software to reflect the newer and better science.
Our customers tend to use our software in three ways:
- One-time, ad hoc analyses to explore what the unknown in their data.
- Programmatic applications that make Luminoso part of a business process, such as classification of incoming calls to specific agents at a call center.
- Systematic analyses that look at the same source of data (like surveys) over time, with special emphasis on differences between periods.
In the first two examples, science changes are only a positive change. In the first example, it’s a one time analysis to determine what even is in the data in the first place. In the second, the science changes that improve the ability to detect certain kinds of language better makes the business process better--higher accuracy in classification, better call routing, well, you know, all around goodness.
But what about those who want to track changes over time? Those who want precision and accuracy in measuring differences between periods in their data--examples include tracking sentiment over time and/or mentions of a particular topic. Well, this is where relying on a holistic approach to analysis will be your friend.
Most likely, you have lots of quantitative information available: Number of survey respondents, number of support tickets, how much money people have spent on product x or product y. This is the stuff that is benchmarkable, that is actually measurable over time because in these instances, you are counting distinct things: responses, products, dollars.
Now let’s take a look what language can do for us--it offers context to these metrics, and gives direction as to where in your quantitative data you should concentrate your analysis. When Luminoso makes a major science change, we will let our customers know; and the best practice is to recalculate your projects such that they reflect the most up to date technology.
Will this change the results of your analysis? Well, it depends. If you are ranking the most prevalent concepts or topics, the order of topics might change, but the same concepts will most likely remain in the top 10. If your data is overwhelmingly negative in sentiment, our science changes will not make that data overwhelmingly positive. Conceptual match counts for topics (i.e., how many comments, reviews, or pieces of data are about a particular topic - even if they don’t use the specific words you’re searching for) may change. The exact matches for a topic (i.e. the number of comments or reviews that contain a specific word or phrase) will not.
So what does this mean for you? It means that you shouldn’t treat language analysis the same as you would when you are counting widgets, or analyzing quantitative data. Major science changes are rare enough that when they do happen, we share that openly with our customers. When you run new data, check to see if anything changed-- is there something surprising that has cropped up? Compare the results you see to your quantitative data--has negative sentiment jumped up? Has that had an affect on sales?
As I mentioned, severe changes in results are reflective of changes in data rather than science. But when our science changes, we will let you know what changed and how it could impact your analysis. And remember, always depend on your core quantitative measures because it is most powerful when you can blend it with language analysis.
Finally, a note about why we bother with science changes. Our product manager for Luminoso Analytics, Alice Kaanta, is also has a PhD in Biology (and wrote a fantastic post a couple of months ago about how changing science impacts business - you can read it here). I asked her what her opinion was on how scientists dealt with changes in theory that governed their discipline. Her answer?
“The truthful answer is 'poorly.' Scientists buy into a methodology for proving a thing, and then if any institutional inertia develops around it, it can be ages before they change their minds or their methods...BUT, the scientists who get on board first are always the winners.”
Luminoso goes beyond just counting words. We push the boundaries of NLP, and we are proud of that. That means science will change how our software works from time to time, but it makes our software better, which makes things better for you, our customers.