Topic modeling is one antidote to the overwhelming volume of content created every day that marketers must understand. In this series, we’ll explore what topic modeling is, why it’s important, how it works, and some practical applications for marketing.
Part 2: What Is Topic Modeling
Let’s begin by answering the question: what is topic modeling?
Here’s a great definition from KDNuggets:
Topic modeling can be described as a method for finding a group of words (i.e topic) from a collection of documents that best represents the information in the collection. It can also be thought of as a form of text mining – a way to obtain recurring patterns of words in textual material.
The easiest way to think of a topic model is a word-based summary of a body of text. Think of how a table of contents outlines a book, or how a menu outlines the food at a restaurant. That’s what a topic model essentially does.
Topic models first came into use in the late 1990s, with Thomas Hoffman’s probabilistic latent semantic analysis. They’ve become more popular over the years as computing power has increased.
How Do Topic Models Work?
Topic models are a product of mathematical and statistical analysis. In essence, they assign numerical values to words, then look at the mathematical probabilities of those numerical values.
For example, consider this sentence:
I ate breakfast.
We could assign arbitrary numerical values to this sentence, such as I = 1, ate = 2, and breakfast = 3.
Now, consider this sentence:
I ate eggs for breakfast.
We would have a sequence like 1, 2, 4, 5, 3 using the previous numbers.
Next, consider this sentence:
Mary ate breakfast with me.
This would have a sequence like 6, 2, 3, 7, 8.
Put these sequences together:
1, 2, 3
1, 2, 4, 5, 3
6, 2, 3, 7, 8
We begin to see increased frequencies in this table. The number 2 appears 3 times. The number 3 appears 3 times. The number 1 appears twice, and always next to the number 2. The number 3 moves around a bit.
This mathematical understanding of our text is how topic models work; statistical software predicts features such as:
- How often does a number (word) appear?
- How often does a number (word) appear only within one document, but not in others?
- How often do certain numbers (words) appear next to each other?
While this seems like a lot of work to analyze three sentences, the value of topic modeling is performing this kind of analysis on thousands or millions of sentences – especially when time is important.
For example, suppose we’re attending a major conference like Dreamforce or CES. If we want to participate in relevant conversations, we should know what the most important topics are on the minds of attendees. However, mega-events often generate hundreds or thousands of social media posts per hour. No human or even group of humans could reasonably keep up with the raw feed from such an event. A machine will.
Walking Through a Topic Model
In the next post in this series, we’ll explore the process of creating a topic model. Stay tuned!
Want to read more like this from Christopher Penn? Get updates here:
Get your copy of Marketing Blue Belt!