Warning: this content is older than 365 days. It may be out of date and no longer relevant.

You Ask, I Answer: Correlation of Visitors and Conversions by Visitor Type?

Andy asks, “I received a request yesterday for a report that “maps the correlation between the % of new and returning visitors onsite and the number of conversions, by day”. My first thought was to go into GA and create a couple of different views showing new/returning visitors and conversions. What do you think?”

You Ask, I Answer: Correlation of Visitors and Conversions by Visitor Type?

Can’t see anything? Watch it on YouTube here.

Listen to the audio here:

Download the MP3 audio here.

Machine-Generated Transcript

What follows is an AI-generated transcript. The transcript may contain errors and is not a substitute for watching the video.

Christopher Penn 0:13

In today’s episode, Andy asks, I received a request yesterday for a report that maps the correlation between the percentage of new and returning visitors on site.

And the number of conversions by day.

My first thought was to go into Google Analytics and create a couple of different views showing new and returning visitors and conversions.

What do you think? My first question is? Well, what are we trying to prove here? Because visitors should correlate to, to conversions.

Christopher Penn 0:45

I think maybe you’re trying to figure out whether it’s new audiences or returning audiences that drive more conversions.

And so I don’t know that this would be the best model to prove that, but you can do it.

So here’s the steps that I would take.

First, you need to extract the data itself.

So you need the number of new users by day, number of returning users by day and the number of conversions by day, be aware that mixing and matching different number types is a bad idea.

So instead of percentage of new and returning visitors, which isn’t great, just use the absolute numbers.

So we had 12, returning users and 44 new users, and then the number of conversions.

Awesome, don’t mix up data types, like don’t have new users and sessions.

Don’t Don’t do that.

So that’s step one.

Step two is get all that data out of Google Analytics and into some kind of system that can run correlations.

And it has to be a system that can do different types of correlations.

Generally speaking, when we talk about correlation, there’s three different kinds of correlation that you can run.

There’s a Pearson correlation, a Spearman correlation, and a Kendall tau correlation.

And if you’ve never heard these terms before, then you are just like me, and you slept through statistics class in college, right? Which I did, I did terribly in that class, which is ironic, had to relearn all later on in life.

Each statistical test of correlation fits a different type of data.

So Pearson correlation, which is generally what is most used in tools like Excel and stuff like that.

When you type in like the correlation function in Google Sheets, or in Tableau, it’s probably using Pearson out of the box.

Pearson correlations are good if your data, when you plot it out, looks like a bell curve, right? For for normal distributions.

Pearson correlations are the best tool for the job.

That is not most marketing data at all, right? If you were to take your marketing data and reorder it, by largest to smallest, most marketing data is a power law distribution.

Now a Pareto curve 80% of your traffic comes from 20% of your of your days, and so on and so forth.

longtail, you hear that term a lot.

When you have data that is doesn’t fit a normal distribution, which is called nonparametric.

Spearman correlations are the best to use the the best technique to use, and instead of a p value, you get an R value.

And then the third one, Kendall tau is best for ordinal data.

So if you have two lists of ranked data, you would use Campbelltown for that.

So get all of your data out, get it into a spreadsheet, make sure it is all ordered by day, and then run a Spearman correlation for each one of the things you’re going to find probably is you’ll find that one of those two metrics has a stronger correlation.

With Spearman correlations, they are they’re just like Pearson correlations in the sense that anything above point five is a strong correlation.

Anything below.

Point 5.25 is a weak correlation.

Anything below point two five to zero is no correlation.

And the same is true on the negative side.

So you can have negative correlations as well.

The question that I would ask though, is, so what? Right, so if you find out more new users has a greater correlation to conversions and returning users.

Okay, so what is the person who’s requesting this? What are they going to do with that information? I suppose if it’s new users, then you look at the channels that drive new users and say, Okay, let’s invest more in the channels that are driving the most new users.

Same is true for returning users.

But fundamentally, I would question whether that data type is as relevant as you know, the channels the individual channels.

So one thing you might want to take a look at is looking at new and returning users by your most popular channels like search or email or social media and so on and so forth, because that might lend more granular results that you can save a step or two.

In terms of the level of analysis.

That’s not to say that it’s not the analysis is not without value.

It’s the number one thing we always want to ask a stakeholder when they come to us with a very specific mathematical test is okay, what are you going to do with the information? Right? What decisions will you make? If they’re just going on a fishing trip?

Christopher Penn 5:29

You might say, Okay, that’s cool.

And still happy to run the analysis for you.

But have you ever thought about what your next step is? Because most people don’t, most people don’t think about the next step.

And as a result, because they don’t think about the next step, they don’t have a clear picture in their heads of what it is they would do next.

And that’s where all the value is in analytics.

A lot of the time analytics, by itself is rearview mirror, right, it’s what happened is looking backwards.

That’s a limited value.

When you’re driving, there is value in looking in the rearview mirror every now and again.

But unless you’re Mario Andretti and then you just never look in the rearview mirror.

But if you’re trying to make decisions going forward, you have to have some sense of where this person wants to go with this thing.

And if they don’t know, that is the opportunity for you as an analyst to add value.

That’s an opportunity for you to say, Okay, have you also thought about this channel level, new and returning users? And then help them run the tests and say, Okay, here’s the conclusion that we reached, that x or y is a better choice for getting more of the result that you care about.

So give that some thought.

But it’s a very interesting question, and I think the analysis is worth doing.

Let’s see what you come up with.

Thanks for asking.

If you’d like this video, go ahead and hit that subscribe button.


You might also enjoy:


Want to read more like this from Christopher Penn? Get updates here:

subscribe to my newsletter here


AI for Marketers Book
Take my Generative AI for Marketers course!

Analytics for Marketers Discussion Group
Join my Analytics for Marketers Slack Group!