You Ask, I Answer: Unintentional Bias in Data Science and ML?

Warning: this content is older than 365 days. It may be out of date and no longer relevant.

You Ask, I Answer: Unintentional Bias in Data Science and ML?

Chacha asks, “Is there such a thing as unintentional bias?”

This is an interesting question. Bias itself is a neutral concept – it simply means our data leans in one direction or another. Sometimes that’s a positive – there’s a definite bias of people in my Slack group, Analytics for Marketers, towards people who love analytics and data. Other times, bias is a negative, such as redlining, the practice of willfully excluding certain populations from your business based on broad characteristics like race, religion, or sexual orientation. In machine learning in particular, there’s tons of unintended bias, bias that occurs when we don’t give our machines strict enough guidelines about what we want our models to do or not do.

Unintended means it wasn’t part of our design, part of a conscious choice on our part. There will be bias; the question is, what is its impact, and do we then keep it or disregard it?

Most bias can be mitigated at either the feature engineering stage or the model backtesting stage if we know to look for it. The greater question is, are we looking for it? This is where the science in data science comes into play.

Watch the video for full details.

You Ask, I Answer: Unintentional Bias in Data Science and ML?

Watch this video on YouTube.

Can’t see anything? Watch it on YouTube here.

Listen to the audio here:

Download the MP3 audio here.

Machine-Generated Transcript

What follows is an AI-generated transcript. The transcript may contain errors and is not a substitute for watching the video.

In today’s episode tchotchke asks, Is there such a thing as unintentional bias? This is an interesting question. Yes, there is. bias is itself sort of a neutral. A moral concept me has no moral basis period. It simply means that our data leans in one direction or another. It has a skew or a a deviancy off this the central tendency, sometimes that’s a positive. For example, there’s a definite bias in my slack group analytics for marketers towards people who like analytics and data, right? That would make logical sense and that bias is intentional, and unintended, unintentional bias, and that is that statistically, if I look at the number of people who are in the group and their inferred gender, it leans female, that was unintentional. At no point did I are the trusted Insights Team say we want to focus just on this particular expressed gender. Other times, bias is a negative, such as the practice of redlining of practice, from the dating all the way back to the 1930s, when banking and insurance companies took out a map and drew red lines around certain parts of cities where they didn’t want to do business with people in those in those parts of the city based on broad characteristics, like race, or religion, or sexual orientation. And those that, again, is unintentional bias when you do the red lining, but there is plenty of unintentional bias where you say, I want to isolate, maybe people who have a lower income from my marketing. But that has comes with a whole bunch of socio economic characteristics, which do include things like race and religion and sexual orientation. So that would be unintentional bias. in machine learning. In particular, there’s a ton of unintended bias bias that occurs when we are not thoughtful enough about the choices we make in our data. And we when we don’t give our machines strict enough guidelines about what we want our models to do or not do. A key part of data science and machine learning today is asking yourself throughout the process, what are the ways that this can go wrong? is a very popular subreddit called what could go wrong? It’s a, you know, silly videos and stuff. But that key question is one that not enough people ask all the time, and then marketing, what could go wrong? If I build a list that is is cold from these data sources? What could go wrong? What What could go wrong in that data? What could go wrong in that analysis? What could go wrong in those insights? What could go wrong in our strategy? That is something that we’re not thinking about enough. Remember, the unintended bias means it wasn’t part of our design, it wasn’t part of a conscious choice that we made on our part, there’s always going to be a bias in our data sets. The questions that we have to ask our Is this a conscious decision we’re making? And if so, is it legal? What is the impact of an unintended bias? If we do discover one? And then assuming that it is legal and ethical? Do we keep it or disregard it? So again, if I see a bias towards a certain gender in my email list? What is the impact? Do we keep it? Do we disregard it? What are those things that that matter? The other thing we have to consider is that most bias can be mitigated, not eliminated, but it can be mitigated, the impact can be reduced. At a couple of different points in the machine learning pipeline in our data science pipeline, one is at the feature engineering stage. And when we are deciding what characteristics to keep or exclude from our data, we have to make decisions about if there’s a bias there, should we keep it or not?

There is a I’ve heard some less skilled machine learning practitioners say, Oh, well, if, if gender is a concern, then we just delete that column. And then the machine can’t create features from that characteristic. And that’s a really bad thing to do. Because by taking gender out of your training data, then allows the machine to create inferred variables, which can be functionally the equation agenda, but you can’t see them. Right, if you know, if you have, for example, all the likes of somebody on Facebook, they know the movies, the books, the music, that they like, guess what, your machine can very easily create infer gender, and ethnicity and different sexual orientation with a high degree of accuracy. So instead, the best practices becoming, keeping those characteristics which the law deems is protected, and telling machines, these are the acceptable parameters from which the model may not deviate. For example, if you have, let’s say, you’re doing, I don’t know ROI on on your data set, and your and your machine spits out and says hey, the ROI of a certain religion is higher or lower, based on on that person’s religion, you can specify to the machine that people who are no Rastafarians must have the same outcome must be treated the same as people who identify as I know, pasta, Aryans, right. And so you can you can tell the machine, you must know this characteristic exists. And then you must treat it equally, he must not give a different outcome to somebody based on a protected class. So that’s an important part of that. So feature engineering is one of those stages where we can decide what key features to keep, and then mitigate bias within them. And this software like IBM is open scale that can actually you can declare those classes and say, you may not deviate from Express set out guard rails on your model. And the second is on that model back testing stage, where you are testing out your code to see what results that spits out. And that’s when you as a human have to QA the code and say, it looks like there’s a bias here, it looks like there’s a bias here, it looks like there’s a bias here, we can keep that one, we can’t keep that one. But you’ve got to be looking for it. And that’s where data science and statistics really come into play. And where a lot of folks who are new to machine learning and maybe talk about that crash course in in machine learning thing. Can they come up more coders than they do? still having a statistical background? As a result, they’re not thinking asked how could this data be misused? How could this data go wrong? How could we create unintentional biases that we then have to deal with later on? So there absolutely is such a thing as unintentional bias. And frankly, most of the time for most people in most situations, most bias is unintentional. We just have to know for it. Note note to look for it, ask how could this go wrong? And then mitigate it either and feature engineering model back testing. And this is something that marketers in particular have to be very careful about because marketers have a lot of personally identifiable information. And marketers tend not to be trained in statistics and data science to be looking for these biases. So when we use marketing automation tools to help us optimize our marketing, we also have to be asking, Are these tools creating biases behind the scenes that we do or do not want? So something to keep in mind there? Great question. important question. And if you want to learn more about the ethics side of this, I recommend picking up the free copy of Dr. Hillary Mason and Michael Keaton his book called ethics and data science. You can find it on Amazon as part of Kindle Unlimited, and I believe it’s zero dollar cost too. So make sure you pick up a copy of that book. It’s a really, really important read if you’re doing any kind of work with personally identifiable information. As always, please leave any questions do you have in the comments below, and subscribe to the YouTube channel in the newsletter, I’ll talk to you soon. want help solving your company’s data analytics and digital marketing problems. Visit trust insights.ai today and let us know how we can help you

Machine-Generated Transcript

Comments

Leave a Reply Cancel reply

Pin It on Pinterest