You Ask, I Answer: Removing AI Bias by Removing Data?

Warning: this content is older than 365 days. It may be out of date and no longer relevant.

You Ask, I Answer: Removing AI Bias by Removing Data?

Tammy asks, “There was a talk this week about AI and ML and I was disturbed by a claim a speaker made at the CXL conference. He basically asserted that we should not be concerned about bias being trained into AI because we can just ‘remove the variable from the model that causes the bias.’ Essentially if we don’t want the model to bias against race then we should take race out of the model. What do you make of this?”

This person has no business building AI models, as they have no understanding of bias. They will create models that are inaccurate at best and dangerous at worst. Watch the episode to find out the correct way to deal with bias and how systems like IBM Watson Studio and IBM Watson OpenScale correctly help you manage bias in your data.

You Ask, I Answer: Removing AI Bias by Removing Data?

Watch this video on YouTube.

Can’t see anything? Watch it on YouTube here.

Listen to the audio here:

Download the MP3 audio here.

Machine-Generated Transcript

What follows is an AI-generated transcript. The transcript may contain errors and is not a substitute for watching the video.

In today’s episode Tammy asks, there was talk this week about AI and machine learning and AI was disturbed by a claim a speaker made at the CFL conference, he basically said that we should not be concerned about bias being trained into AI, because we can just quote remove the variable from the model that causes the bias. Essentially, if we don’t want the model to bias against race, then we should take race out of the model. What do you make of this? This speakers an idiot, this speakers at it who is completely unqualified to be doing artificial intelligence and machine learning? There is no polite way of saying that this person, I’m assuming it’s a guy because you use the heat pronoun but this person has no business making artificial intelligence models. And this is one of the reasons why people talk seriously about things like certification or qualification. Because if this person deploys this modeled in the wild input

They are going to create her randomly biased models. Here’s why removing a variable from a model because you don’t like the effect it creates is functionally like saying I don’t like that my car is going off to one side because of the right we also want to remove the right we’ll, we’ll know what if that we also important, what if it is? Is it a critical piece of the model? That is that philosophy so I’ll just remove the thing is causing the problem was completely wrong.

The reason why, in addition to the variable right that might be important is that

just because you remove the explicit variable does not mean you remove the bias from the model. machine learning models, particularly deep learning models, but even ones like gradient boosting models can create what is called inferred variables. This is when you engineer or the machine auto engineers variables together

that move in sync

For example, suppose you have Facebook data, and you have collected, books you like and movies you like and music you like. Guess what that combination of data is so good a predictor of age of race, of religion of gender, that when the machine creates an inferred variable from that, it will rebuild age and race and gender and then discriminate on it. And because you’ve removed or you’ve removed race, from the model, you make the assumption that the model is no longer biased, when in fact, it has rebuilt those biases right in and now because you think you’ve removed it, you’re no longer looking for it, you’re no longer trying to detect it. And that means that the model can go wildly off course.

So this person’s an idiot. What’s the right way to do this? The right way to do this is to do

What IBM does,

which is to in your systems and your modeling software and your production software and your monitoring software for AI, you declare protected classes, you say age is a protected class and must fit the natural demographic skew of the population you’re targeting against race is a protected class, you may not discriminate outside certain variances, gender is a protected class. For example, if you were to say that your gender of the gender split between male and female should be roughly 5050 or 4545 and 10 for for non binary folks, whatever the case may be, you declare to the system this is protected, you may not deviate outside of these norms beyond like one or 2%. And then what the system does is it holds those those variables as anchor points and when it builds a model around them. It does

does not allow the model to pull those variances in any direction. It’s kind of like again the example of a tire on your car that’s out of alignment.

This essentially puts a you know, additional hands on the steering wheel to keep the car going in the direction it’s supposed to be going and not allow that that one walkies hired a pole the car in the wrong direction

by using protected classes as as part of the model and declaring that they are protected classes, you ensure that the model will not be biased because the model cannot use those variables for determination. For as as targets as as as as inputs beyond a certain point, and you also make them targets you say you must meet this qualification you must stay within this lane.

Watson’s open scale product does this really well in production, which by the way is really important because after you deploy the model if you’re not monitoring

For biases creeping in as the model is in production, you risk very serious consequences Amazon found that out the hard way when their HR system started systematically discriminating against women nobody was watching the system in real time to say um let’s that’s that’s know yet the 5050 in our, in our test data, our training data has become 9010. And in our in our production data

you want you want systems in place in production that are monitoring and holding those predictive classes in place and alerts you and says hey, something is going awry. Microsoft found that out the hard way with their tail Twitter bot which got turned into a racist porn button 24 hours nobody put guard rails on it. Nobody said these are the things you may not do. And had somebody thought that through that might have been a slightly more successful experiments. So

know removing the variables from the model

Because the bias does not eliminate bias, if anything, it makes it worse because it reduces the explain ability to interpret ability of the model instead, the right way is to protect the variables that are protected classes that are protected aspects. And then be keeping an eye on your models be keeping an eye on your data be keeping an eye on the results that system puts out. And be fact checking it. This is a critical career and a lot of ways right now. And for the next couple of years of someone supervising the machines and saying machine that is not an okay, output. Explain yourself, tell me why you made those decisions. And that’s another critical point that this person clearly has no understanding of.

All of your model should have interpret ability built into them, all of your model should have the ability to spit out hey, here’s how I made these decisions. Here’s what’s in the black box. Deep Learning

in particular has gotten so much

Better and the last six months in showing how a machine made a model that there’s no excuse anymore for saying let’s just a black box and we know how it works but it’s the right answer Well, you don’t know that and regulations like GDPR require require you by law to be able to explain your models. So using software like IBM Watson studio and and Watson open skill will help you hit those benchmarks and make sure that you are compliant with the law. So what I make of this this person has no business building AI models this person is probably should take a course in ethics. I strongly recommend anybody who’s looking to get into this stuff to read Dr. Hillary Mason’s free, completely free book on Amazon called ethics and data science. You need to read it, use a checklist in it and then use use commercial systems from reputable vendors that has some of these checks and balances built into the so that you don’t make these

mistakes because these mistakes will get you sued, they will get you fired and they will make society a worse place. If your model gets out in the wild. You can tell I feel pretty strongly about this.

Great question Tammy.

Be very, very be aware of that company. Wow. And for everyone else,

as you’re deploying machine learning as you’re testing it out as you’re putting it in production as you’re supervising it.

Make sure that somebody has done their homework and has thought through things like bias because you can ruin your company, your relationships with your customers. And as we see with some companies like Facebook, you can ruin society. If you are not thinking about what your AI does. As always, please leave your comments in the comments box below and subscribe to the YouTube channel and the newsletter I’ll talk to you soon.

want help solving your company’s data analytics and digital marketing problems? This is trust insights.ai today and let us know how we can help you

Machine-Generated Transcript

Comments

Leave a Reply Cancel reply

Pin It on Pinterest