Warning: this content is older than 365 days. It may be out of date and no longer relevant.

You Ask, I Answer: Reducing Bias in Datasets

In this episode, I answer this question: “Although AI can help solve various issues we face today, it can also create discriminatory algorithms. What solutions do you pose to solve the problems that AI causes?” Bias in datasets is an issue we all have to tackle.

You Ask, I Answer: Reducing Bias in Datasets

Can’t see anything? Watch it on YouTube here.

Listen to the audio here:

Download the MP3 audio here.

Machine-Generated Transcript

What follows is an AI-generated transcript. The transcript may contain errors and is not a substitute for watching the video.

Alright, Lane asks, although AI can help solve various issues we face today can also create discriminatory algorithms.

What solutions do you post to solve the problems AI causes? Well, here’s the thing.

Everything that happens with machines comes from human data, right? There’s nothing that is outside of our existence.

And so the problems that we see with bias in AI and machine learning come from us, right? The humans, we are the problem.

And the problem occurs, the problem can occur in a variety of areas.

It can be the people we’ve hired, right, if people, the people we’ve hired have biases, it doesn’t matter what else you do, you’re going to have problems, right? So that’s a huge part, it’s an overlooked part of machine learning AI is what are the biases.

Remember, there’s two kinds of bias, right, there is human bias, you know, maybe you don’t like people with red hair.

Maybe you don’t like people of a certain race, or religion or whatever.

Those are human biases.

And then there’s statistical biases, which is just where a sample is not representative of the population, the sample is drawn from.

The people that you hire, have got to understand both of those, and know to look for both of those in the data that they’re working with.

Right? So if you’re looking at, say, some survey data that you’ve done before, in some market research, and you don’t know how to check to see whether the sample is representative or not, you could have some really serious problems.

So people is first second overall strategy, is there a bias inherent in your strategy? I remember a number of years ago, my wife worked at a market research firm.

And the they were the epitome of what not to do in market research.

They were a conservative sort of Think Tank.

And people would come to them saying, I need research that backs up this point of view, by anybody who knows anything about market research and statistics knows that that’s pretty much the worst way that you can do market research other than just making things up completely.

And so the strategy can have biases in it.

The data can have biases, and there are mitigation tools for that, for toolkits, for example, like IBM’s AI fairness, 360 toolkit that can look at your data and say, Hey, these look like protected classes like gender, or religion, or ethnicity.

And it looks like these are non representative values in here like, hey, for some reason, this entire dataset, which has a gender field, is like 98% men and 2% women, you might have a problem in your data.

So what these tools can do to identify biases in the data set their biases that can creep in, in the choice of algorithms.

And again, more advanced tools like IBM Watson Studio have some some protections built in to avoid those problems, or mitigate them or at least identify that there’s a problem.

And then you get bias in the model as it drifts, right.

So you publish a model, it’s in production.

And then over time, as it gets new data and learns from new data, it becomes less and less accurate, it drifts.

It also may have biases in it that cause drift.

The most famous example, this was back in 2016, Microsoft Tei.

They focus on Microsoft created a twitter bot.

And it learned from the tweets people sent it and it became a racist porn bot in less than 24 hours.

There were no protections on model drift.

And so each of these areas where bias occurs or can occur has some level of protection you can build into it, but you have to know to think about it to look forward to ask questions about it.

You’ve got to have a way to identify it in the hiring process.

You’ve got to have a way to for a safe way for people to raise concerns in the workplace, right if they see a strategy that’s clearly biased in some way that’s That’s incorrect.

There’s got to be a safe way for people to elevate their concerns and and have those concerns answered.

Again, using tools like fairness 360 for the data, using tools like Watson Studio for the algorithms and the deployment.

And monitoring your models for drift will help you reduce the potential for or the impact of bias and the thing is we have to, we have to be on the lookout for it.

And we have to accept that it is going to occur and remediate it.

And one of the big challenges that companies will run into is they will fight, tooth and nail sometimes to say that they are not biased.

Like it’s not possible.

Well, have you checked, right? Do you know? Have you proven that bias does or does not exist? In any of the systems and if you can’t, you’ve got a problem.

Treat datasets, models, algorithms and, and production systems.

guilty until proven innocent.

When it comes to bias.

Assume that there’s bias until you prove that there isn’t.

If you want to have the best possible outcomes really good question.


You might also enjoy:


Want to read more like this from Christopher Penn? Get updates here:

subscribe to my newsletter here


AI for Marketers Book
Take my Generative AI for Marketers course!

Analytics for Marketers Discussion Group
Join my Analytics for Marketers Slack Group!