Warning: this content is older than 365 days. It may be out of date and no longer relevant.

You Ask, I Answer: First Steps With New Data?

Katherine asks, “What’s the first thing or set of processes you do when you receive new data from a customer?”

You Ask, I Answer: First Steps With New Data?

Can’t see anything? Watch it on YouTube here.

Listen to the audio here:

Download the MP3 audio here.

Machine-Generated Transcript

What follows is an AI-generated transcript. The transcript may contain errors and is not a substitute for watching the video.

Christopher Penn 0:13

In today’s episode, Catherine asks, what’s the first thing or set of processes you do when you receive new data from a customer? Probably exploratory data analysis.

Alright, exploratory data analysis is the data science and machine learning equivalent of looking in the fridge before you cook.

Right? So you look at, you open up the fridge, you look at what’s in there, and you say, Okay, I’ve got chicken, I don’t have steak, I’ve got onions, but they don’t have peppers, I’ve got carrots, but I don’t have celery, and so on and so forth.

And based on what you’ve got in the fridge, that dictates what kinds of things you are or not going to cook.

If you’ve got your heart set on steak, but there’s no beef in the fridge.

You’re not having steak, right? So when a customer hands over new data, first thing is you look at it, you investigate it, you say, Okay, what’s in the box? Like? What did the customer give me? What condition? Is it in? Is it in good condition is in bad condition? Are there lots of missing variables? or missing data points? Are things labeled correctly? Does the data answer the question that the customer is trying to ask, that’s a critical part of this, if a customer says I want to know social media ROI, and they provide no cost data, you can’t do social media ROI, there’s just no way to do that you’ve got a substantial missing ingredients like baking a loaf of bread, and you’ve got no flour.

Now, you’re probably not breaking baking bread there.

So that’s the first part is exploratory data analysis.

And that’s, you know, eight different parts.

So you have your goal and your purpose.

You have your data requirements and data collection, you have your initial analysis, like looking at it, your descriptive analytics, see what kinds of dimensions and metrics are there? You look, do your data quality stuff, like what kinds of quality data is in there? There is recurrent requirements, verification, you’ll look at the data and go okay, Does this answer the question that’s being asked of it.

And if it doesn’t, you got to start over.

After that, you’ll do prep, which is cleaning, centering, scaling, etc, you’ll probably do some feature engineering, where you’re going to create new features out of existing ones, like day of week or hour of day, from a date, and then your modeling or your insights, depending on whether you’re going to be pushing a model into production, or just doing an analysis, those are the steps that are vital.

Anytime you get new data, it’s like anytime you get maybe a delivery of groceries, right? And you have a company that doesn’t shopping for you, and they drop off the box on your doorstep.

And the first thing you do is you open the box and go okay, did they get my order, right? I ordered apples and there’s pineapples.

Okay, that’s, that’s not helpful.

That’s where you start.

Because that will also help avoid failure later on.

If a customer hands you data, and that data, there’s something wrong with it.

The sooner you catch that, the less time and money you waste, right, the less beating your head against the wall, or worst case scenario, you think the data is fine, you’re running an analysis on it, you hand off the results to a customer and it’s wrong.

And it might be wrong in a subtle way in a way that you don’t catch.

But then, you know, a month a quarter a year later, the customers like, hey, our business is going down.

Why? Well, because you made an analysis of bad data.

Right? It’s like you you’re you eat something that tastes fine the next day, you’re sick.

Well, yeah, yeah, ate some food that was contaminated.

And you know, maybe you the next day, you find out that that was not the case.

Or if it was like a really bad mushroom, you might die 10 days later, because liquefied your internal organs, which can happen.

So that’s the first most important part, you got to open up that fridge and look inside and see what do we have? And can it make the things that we want to make? If you skip that part, if you skip the exploratory data analysis, you will be in a world of hurt, because at some point, you will be handed data that isn’t clean, that isn’t complete.

That isn’t correct.

And you will use it and you will lament your choices.

I guarantee it.

So that’s the first and most important step to do before you do anything else.

Good question.

Thanks for asking.

If you’d like this video, go ahead and hit that subscribe button.


You might also enjoy:


Want to read more like this from Christopher Penn? Get updates here:

subscribe to my newsletter here


AI for Marketers Book
Take my Generative AI for Marketers course!

Analytics for Marketers Discussion Group
Join my Analytics for Marketers Slack Group!