Topic modeling is one antidote to the overwhelming volume of content created every day that marketers must understand. In this series, we’ll explore what topic modeling is, why it’s important, how it works, and some practical applications for marketing.
Part 3: How to Interpret Topic Models
One of the key flaws of topic models and their visualizations is that, to the uninitiated, they are difficult to glean insight from. Their raw outputs are complex, and the accompanying visualizations often confuse more than explain. Today, we’ll examine ways to interpret and understand topic model outputs, to better glean insights from them.
Let’s begin by examining some of the most common topic model outputs.
Topic Model Top 10 Chart
One of the most common visualizations of a topic model is a simple bar graph chart of the different topics and the associated strengths of keywords within each topic:
This graph visualizes our topics – usually an arbitrary number such as 10, 20, or 25 – and the importance of words within each topic. It lacks, however, a sense of relevance of the topic overall to the entirety of the text we’re measuring, but it’s not a bad start.
Here’s another basic visualization of topics and the relevance of each topic to the overall corpus:
This graph has the opposite problem; we have no way of understanding the weight of individual words, but we understand the relevance of each topic to the corpus overall.
Topic Models Require Domain Knowledge
Recall that topic models are nothing more than statistical analyses of large bodies of text. They are mathematical summaries of the most prominent words. Thus, interpreting topic models means we need to have domain knowledge of the overall topic.
For example, in the above topic model of tweets about me, I understand each topic well because I am a domain expert on me, as anyone would be about themselves. I’m able to understand the relevance of each topic without needing the language context itself. I can tell you which topics are from which conferences and events, or what themes people associate with me most.
If this were, say, an analysis of the tweets of someone expert in financial engineering, I would have little to offer in the way of understanding. I wouldn’t know the person, and I don’t have domain expertise in financial engineering.
Thus, developing at least cursory domain knowledge of what we’re modeling is a pre-requisite to extracting the most value out of topic models. Alternately, if we have access to a domain expert, we will be able to create value.
Three Ways to Read Topic Models
Once we’ve established that we’re able to read and provide our own context to a topic model output, we begin divining meaning and insight from it. To make the most use of topic models, we should ask ourselves three key questions when looking at a model.
In examining a topic model, or several side-by-side, we ask what’s expected? What should be there? What’s the common ground?
This is especially important if we are using topic modeling to provide competitive marketing insights. What are the common ground topics among two different companies? Understanding common ground helps us to understand what won’t be a competitive advantage.
Likewise, if we were performing topic modeling to understand influencers, what topics do they share? If we’re not also sharing those topics, we might be missing a key part of the conversation.
When we look at topic models, we look for anomalies, for unexpected things, for topics and words that don’t make sense.
For example, if I was analyzing tweets about me and saw a prominent topic or keyword that had nothing to do with me, that would be a great starting point for more investigation. Do I have a data problem? Or is there a topic that others think I’m knowledgeable about that I’m not – but represents an opportunity?
When we use topic modeling to understand conversations in our market, anomalies represent opportunities. Is there a new angle we’re not participating in? Is there a new competitor we were unaware of?
The third and most difficult analysis in topic modeling is what’s missing. What’s not there that should be there? Understanding what’s missing requires deep domain knowledge, to know what else exists in the domain.
Business folks love to call this the green field, the white space, the blue ocean, and these empty spaces are valuable for their lack of competitive pressure. However, in topic modeling, we must be equally careful that we didn’t inadvertently exclude data, creating an artificial space that doesn’t really exist.
Once we do find something missing, we have a great starting point for building marketing campaigns and content to fill that space.
For example, I was recently looking at conversation at a conference, and the Internet of Things (IoT) was a prominent topic. However, completely absent from that conference’s conversations were security of IoT and machine learning of IoT data. These two secondary topics should have been there, and so that audience, that conference had a glaring omission. A savvy marketer would then approach that conference and its attendees with IoT security and data analysis offerings to fill that awareness gap.
In the next post in this series, we’ll examine when to use and not use topic models. As useful as they are, when are they most applicable? When should we choose a different tool? For the answers, stay tuned!
Want to read more like this from Christopher Penn? Get updates here:
Get your copy of Marketing Blue Belt!