Select Page

Topic modeling is one antidote to the overwhelming volume of content created every day that marketers must understand. In this series, we’ll explore what topic modeling is, why it’s important, how it works, and some practical applications for marketing.

## Part 4: When To Choose Topic Modeling

Now that we’ve built a topic model and learned how to interpret it, we should spend some time understanding when topic modeling is and is not appropriate.

Topic modeling is a part of machine learning; machine learning is broadly divided into two categories and two kinds of data.

First, our two kinds of data are continuous and categorical. Continuous data is typically numerical data; in marketing analytics, we call these metrics. A general rule of thumb is that if we can perform mathematical operations on data, it’s continuous/a metric.

Categorical data is typically descriptive data; in marketing analytics, we call these dimensions. These are typically non-numerical things we have to count; for example, someone’s allergies would be wheat gluten or shellfish. In a database, these would be listed as words; to make use of them, we have to count them up first and then perform math on the counting, but never on the actual data itself.

The categories of machine learning are supervised and unsupervised; in supervised learning, we are analyzing data to understand a known outcome. For example, if we want to know what influences purchasing decisions, we would use a technique like a random forest to process our numerical data and tell us what makes up a purchasing decision.

In unsupervised learning, we don’t know what the outcome is. We use machine learning to help us understand the data by classifying it, simplifying it, and bringing order to it.

Here’s a chart that helps simplify the above:

What kinds of questions might each category answer?

• Supervised learning of continuous data: What drives X/causes X?
• Supervised learning of categorical data: How many of X occurred?
• Unsupervised learning of continuous data: What relationships are in our metrics that we can’t see?
• Unsupervised learning of categorical data: What’s in the box?

Let’s look at four brief examples:

• Supervised learning of continuous data: predict when search interest in our top SEO keyword will be highest
• Supervised learning of categorical data: classify which pictures we post most on social media
• Unsupervised learning of continuous data: how do we understand the relationship between page traffic and social sharing metrics?
• Unsupervised learning of categorical data: what words, phrases, and topics do our favorite influencers use in our field?

Why does this all matter? Topic modeling fits squarely in the unsupervised learning of categorical data. We have a collection of something unknown – a large body of text – and we want to understand it. To do so, we have to associate and reduce the complexity – all the words – to something manageable and understandable by the human mind.

## When Not To Choose Topic Modeling

Based on the above, we should choose topic modeling as a method any time we need to understand what’s in the box, what’s in a large bag of words.

When is topic modeling the wrong choice? When we have a problem that is:

• Mathematical in nature (continuous data)
• Categorizational in nature (supervised learning)

For example, if we wanted to know which social media updates were the most popular, that’s not a question topic modeling will answer. That’s just straight statistics.

If we wanted to know which word or phrase was the most frequently used in our social media updates, that’s not a question topic modeling will answer, either. That’s a form of text mining called term frequency analysis – and it presumes we know what words to count.

Only when we have a large body of text that we need to reduce to human scale is topic modeling the best choice.

## Next: Wrapping Up

We’ll look back over the series in the next post and give some tips as to where to go next in our machine learning journey. Stay tuned!