I’m a self-professed nerd. I react with the same amount of excitement and vigor about a new software release that baseball fans react with about a new season or that Netflix fans react with about new episodes of their favorite show. When IBM Watson Analytics released its new version, I immediately jumped in with my version of binge-watching. Last week, I installed the latest version of Tableau, version 10, and enjoyed binge-analyzing data all night.
The most powerful new tool in Tableau 10? By far, it’s the automatic clustering. Clustering is a way of helping us visualize our data in groups, seeing what data naturally clumps together. It’s an essential part of discovery, of understanding what story our data could tell us. Clustering is traditionally a time and labor intensive exercise; data analysts have been clustering data for decades.
What’s different in Tableau 10 is that clustering is now, at least for an initial pass, drag and drop.
I extracted the URLs from my website and ran them through three different tools to gather data:
- Google Analytics for users, page views, and conversions
- SHIFT Communications’ SCALE scanner for social sharing and external SEO data
- Scrutiny for internal SEO and word count data
When assembled via a giant VLOOKUP table, the result looks like this:
Good luck extracting insights out of that. It’s clean, it’s denormalized, it’s de-duplicated, but it’s largely unreadable by humans.
In regular analysis, I’d take two variables to examine how they relate to each other and search for outliers. For example, here’s a look at content which attracted new users versus total social sharing:
While we see a general, weak relationship between new users and social sharing, it’s difficult to parse out what the outliers truly are.
Enter clustering. With a literal drag and drop, Tableau 10 now clusters the data for me:
I see five distinct groups of blog posts now – highly shared but not attracting new users (cluster 2, in orange), highly shared and driving new users (cluster 1, in dark blue), neither highly shared nor driving new users (cluster 4, in light blue), poorly shared and driving lots of new users (cluster 5, in green), and poorly shared and driving exceptional new users (cluster 3, in red).
From this clustering, I’m now able to ask more questions of my data.
- Why do some posts share well but create few new users?
- Why do other posts deliver opposite performance, lots of new users but few shares?
- What do the posts clustered together have in common?
I’d adjust my content strategy based on the answers to these questions.
Depending on what metrics and KPIs I care about, I would ask different questions. For example, I would combine this analysis tool with IBM Watson Analytics; Watson Analytics is exceptional at doing massive multi-variate and multi-dimensional analysis with just a few clicks. If I care most about conversions (don’t we all?), I’d first ask Watson Analytics what other metrics power conversions:
I don’t know and can’t tell from looking at the raw data what other variable(s) I should attempt clustering on, so Watson Analytics will help power that insight.
Watson Analytics indicates one possible scenario I should be examining is LinkedIn and Sessions to determine what drives goal completions. I’ll take this finding and run clustering in Tableau to identify the outliers:
Inside this is cluster 4, blog posts that accrued both high session counts and high LinkedIn shares that ultimately yielded high conversions. These are the topics I should explore more to determine why.
Tableau 10’s clustering is my favorite feature in the new release. Combined with other analytics tools like Watson Analytics, we’ve expanded our ability to generate real, useful insights from our marketing data.
Want to read more like this from Christopher Penn? Get updates here:
Get your copy of Marketing Blue Belt!