Gianna asks, “What’s the difference between fair and unfair bias? What’s the fine line?”
Fair and unfair comes down to two simple things: laws and values. Statistical bias is when your sample deviates from the population you’re sampling from. Bias isn’t inherently bad unless it crosses one of those two lines.
Can’t see anything? Watch it on YouTube here.
Listen to the audio here:
- Got a question for You Ask, I’ll Answer? Submit it here!
- Subscribe to my weekly newsletter for more useful marketing tips.
- Find older episodes of You Ask, I Answer on my YouTube channel.
- Need help with your company’s data and analytics? Let me know!
- Join my free Slack group for marketers interested in analytics!
What follows is an AI-generated transcript. The transcript may contain errors and is not a substitute for watching the video.
In today’s episode, john asks, What’s the difference between fair and unfair bias? What’s the fine line? So fair and unfair bias really comes down to two simple things, two very straightforward things, laws and values.
So before you deploy any kind of models, or build any kind of artificial intelligence systems, you need to understand what bias is.
bias is when your sample your data, your whatever it is you’re working with, in some way statistically deviates from the population you’re sampling from.
And bias doesn’t necessarily have to be good or bad.
It just is.
It’s a mathematical concept, at least in the context that we’re using it here there’s human bias, which is totally separate from statistical bias.
For example, you may want to sell your product to people who have higher incomes.
Right? That is allowed.
Certainly, it makes logical sense.
And income is one of those things that’s, you know, under somebody’s control to a degree whereas, say their race, they are not under any control of it, they have absolutely no choice in what race they are.
They have no choice in what age they are, etc.
So, when you’re talking about what’s fair and unfair, we’re talking about, are we building tools that disadvantage, a population on in some way that is either against the law or against our values, and the two may not necessarily always agree, there are plenty of things that you may decide as a company or as an individual, are unacceptable to your values, even if they are technically legal.
You may decide you don’t want to say include, gosh, firearms owners or coffee drinkers.
From a, you know, doing business with you, and that’s totally fine because neither of those things are what’s called a protected class.
protected classes are and this depends on where you live where you work where you do business, the laws change from principality.
protected classes are attributes that are protected under law.
In the United States, for example, age, gender, sexual identity or gender identity, sexual orientation, race, religion, religion, and credo are protected, veteran status, disability, these are all things that you may not discriminate on and have been held up in court.
Any number of times to say these are not permissible things to discriminate on.
And so when we talk about fair and unfair bias we are talking about does your software which is what an AI model is does your software in some way disadvantage.
What people on one of these attributes, if your software says that, you know, we’re only going to show our products to white people, that would be illegal, that would be a, a, an unfair bias along a protected class.
And again, you may have values that add additional things that you don’t want to discriminate on that you don’t want to advantage or disadvantage somebody on.
Likewise, you bias is also something that where you advantage in one group of people over everybody else, so it doesn’t have to disadvantage one group, it can just it can disadvantage everybody except one group.
So bias isn’t inherently bad unless it crosses the lines of laws and values.
Now where this gets into trouble for AI and machine learning and data science is not necessarily in the protected classes, not even necessarily in the ethics and values, but in how machines use data.
And, in particular, this is deals with correlates.
So a Carla is when you have a variable or a group of variables that behave very similarly.
So there is a strong correlate in, at least in America, for example, between income and race, there is the, the more of a certain race you are, the more probably your incomes higher or lower.
And so if we’re building a machine learning model, say to only be able to demonstrate to show our ad or our system for our, you know, what we want for customers or who gets in line, who gets privilege treatment.
And we’re doing it on income level, we are potentially also discriminating on a protected class, right? Because, again, there’s a strong correlation between race and income.
So one of the biggest challenges that folks in the machine learning and AI space need to be addressing is doing things, you know, doing matching doing propensity scoring, doing regression analysis that demonstrates that the algorithm is discriminating on things that are permissible and is not discriminating on things that are not permissible.
So, for example, if I’m selling high end coffee, right.
And I want to discriminate on income level, which is permissible.
I also have to be able to demonstrate through statistical testing, that, say, a black family with the same income level as a white family is just as likely to be shown our ad as the white family, right.
Being able to say like race in this case is not a discriminating factor.
Like if you are black and you have above a certain income level.
You are just as eligible just to see our ads.
If, but what will happen in a lot of cases is that people don’t do this testing.
People don’t do this inspection of their own.
And the outcomes, and they end up discriminating.
And whether or not the intent was to discriminate does not matter, it’s whether you actually did.
Because in a court of law, you will be held accountable for your actions.
It’s like, you know, I didn’t mean to hit that guy with a hammer, I just hit them with a hammer really hard, I’ll know does doesn’t mean intent means nothing, you hit the guy with a hammer.
The same thing is true here, where you may not have intended to discriminate along racial lines, or gender identity lines or veteran status, but you did, and you were liable for it.
So a huge part of the work in AI and machine learning is to know to look for bias to test for it.
And then to test for correlates to things that are not allowed things are out of alignment, the laws and values.
And this requires money and time because these are expensive processes to go through and essentially deconstruct a machine learning model to understand how it’s making its decisions.
And it requires a willingness to do so.
Now, if the company you’re working for or the project you’re working on, the stakeholders say, No, we don’t want to do that, then at that point, you are personally liable for outcomes.
Because at best point you may have said, I know there’s a potential problem, I know that we may be doing something wrong.
If you don’t take action to correct it.
You’re by definition an accomplice.
So be aware of that.
But a lot of what’s happening in machine learning really comes down to those correlates.
And you just have to test for them you have to investigate, you have to know that there could be a problem.
And that brings me to my last point.
You have to be asking these questions of your models.
Do not assume that the software knows to look for bias because most software out there a few notebooks options like IBM Watson Studio Studio do know to look for it.
But most software doesn’t.
Certainly anything you code yourself does not inherently do that unless you put it in.
So you need to be building that in as a process in your systems so that you are looking for fairness, you’re looking for unfairness, you’re looking for discrimination from the get go.
And that can happen in the data that can happen to the people you hire to work on the data.
It can happen in the model.
And it can happen in the model after deployment, where you get something called model drift where the model starts behaving in ways you didn’t intend it to.
So looking for unfair bias isn’t something to do one time it is an ongoing process when you’re working with machine learning tools.
So really good question.
Where’s a lot to unpack here.
There’s a lot to talk about when it comes to bias in machine learning, and AI, and in general, but these are things you must be aware of in order to reduce your risk to reduce your liability to reduce the likelihood that you get sued for You know, thousands or millions of dollars? Your follow up questions even in the comments box below.
Subscribe to the YouTube channel on the newsletter, I’ll talk to you soon take care.
want help solving your company’s data analytics and digital marketing problems? This is Trust insights.ai today and let us know how we can help you
You might also enjoy:
- Branded Organic Search: The One PR Metric Almost No One Uses
- You Ask, I Answer: The ROI of Data Quality?
- Why Your Content Marketing Isn't Working
- Simple Is Not The Same as Easy
- Unsolicited "Embargoed" Press Releases Are Absurd
Want to read more like this from Christopher Penn? Get updates here:
Get your copy of AI For Marketers