Jessica asks, “When it comes to training data for marketing AI models, do you think vendors will anonymize/share data sources in the future? Will it be required?”
It depends on the vendor and the model. The raw data for public models, even de-identified, probably will not be publicly available, but should be made available to auditors. Those auditors could certify that the data used was appropriately representative and free from specific kind of biases. For vendors where we’re paying money to them for marketing artificial intelligence services, we absolutely should be seeing either audited results or deidentified data.
Can’t see anything? Watch it on YouTube here.
Listen to the audio here:
- Got a question for You Ask, I’ll Answer? Submit it here!
- Subscribe to my weekly newsletter for more useful marketing tips.
- Find older episodes of You Ask, I Answer on my YouTube channel.
- Need help with your company’s data and analytics? Let me know!
- Join my free Slack group for marketers interested in analytics!
What follows is an AI-generated transcript. The transcript may contain errors and is not a substitute for watching the video.
In today’s episode, Jessica asks, When it comes to training data for models, do you think vendors will anonymize or shared data sources in the future? Will it be required? It depends on the vendor and the models.
So within the context of marketing, Ai, marketing, artificial intelligence vendors will be providing us access to various types of AI models, things like lead scoring models, propensity scoring models, deep neural networks, all sorts of stuff.
And what makes up those models.
Right now, the trend is very much for companies to say like this is our proprietary, you know, special, whatever.
And certainly, companies right now are not sharing any kind of information about their models, their complete and total black boxes.
As regulatory scrutiny continues to ramp up appropriately, it should be ramping up on artificial intelligence.
What goes into those models should be more transparent.
So for public datasets, public models, I don’t expect the raw data to be made available even de identified, because a, those companies probably don’t have the ability to share that at such a large scale, we’re talking massive, massive, massive datasets.
And be if it’s publicly and freely available, you get what you get, is literally, you get what you pay for.
For vendors where you are paying money to that vendor, for use of their model.
I think it’s absolutely a reasonable request to either ask for de identified data, or to ask that the company go through an audit.
Just like we asked companies to go through audits for safety, for diversity, for all these different criteria inside of an RFP, there’s absolutely no reason why adding an audit for the model of a data data from a model wouldn’t be required to be able to say, Okay, I want you auditing firm, you know, KPMG, or whoever, to inspect the data, make sure it’s appropriately representative.
And free from a specific list of biases.
You know, if you were to take the list of protected classes, and say, okay, you’re going to auditors, you’re going to go through and inspect the data, to ensure that the model is free from unfair bias ease along with these protected classes.
And given the list of biases that you’re looking for things that are legally prohibited, all those protected classes age, gender, race, veteran status, disability, gender identity, sexual orientation, religion, etc.
And those are the ones that locks talk about every single auditor would be looking to, to reduce bias on.
And then also any criteria that you would have for your company, if there are things that your company values that you would look for that are not technically illegal, but you feel are run contrary to your values, you have the auditors inspect that as well.
Now is a company got to do that for like, you know, a 299 a month SAS model? Probably not, they’re probably gonna tell you find another vendor.
But for larger models, custom developed things, you know, where you got to pay a company 5060 70,000 a month? Absolutely.
It’s a reasonable request at that point to say like, yeah, we’re gonna bet our business on this.
And we’re gonna bet a mission critical system on this company’s model, this vendors model, it had better be free of all these things.
It’s no different than food, right? You don’t really get a say into the ingredients and a prepackaged food if you want to certify that a food is a certain way, you got to make it yourself.
But vendors working with other vendors absolutely do have to require things like if you specify that food is organic, it has to meet the legal definition of organic and someone certifies that organic food meets those criteria and that is a legally binding requirement.
So the same thing is true when it comes to these types of models.
Now are there auditors and vendors doing this today? I don’t know that any of the big shops, you know, ei KPMG, etc.
I don’t know if they are offering this publicly as a service yet.
But it will not be long.
After the first few lawsuits where a company gets in a whole lot of hot water for a biased model, he will become part and parcel of the industry, you know, the auditing industry and it’s appropriate.
If you wanted to certify it yourself, you absolutely could.
But again, it would have to be worth the while for a company to, to do so if you’re looking for a facial recognition algorithm that and you’re paying5 a month for the company is not going to tell you whether the data set is biased against people with darker skin.
But if you’re building a mission critical app on it, you can absolutely say, hey, I need to ensure that this thing is not biased.
And I’m going to stop paying you, you know, five figures or six figures a month until you do that.
It all comes down to economics.
When it comes to your company, if your company is building models or your own plan, and build your models with the assumption that you will be required to, at some point, disclose de identified versions of the data, you obviously have to protect user privacy, you always have to protect people’s identities, especially around protected class data, and personally identifiable information.
But beyond that, plan, that somebody else will be inspecting your data at some point down the line.
So make it in a format that is you know, easily machine readable, make it in a format that it can be exported, make it in a format that all your variables are clearly named.
And obviously named.
Things like if you’re going to have gender, like have gender not, you know, attribute 56.
So that you can make the auditing process on your own data as easy as possible.
And as painless as possible, build with the assumption that somebody else at some point will be taking a look.
Not necessarily the general public, but an auditor or somebody or somebody like that.
And make your life easier.
Future you will thank you future you will thank you for for making the audit process less painful.
And because it is coming for sure.
So yeah, that’s where we are with data sources and models.
Really good question.
important question for all of us who are working in the industry to keep in mind and we have to build for it.
As the future comes around to getting to us your follow up questions, leave them in the comment box below.
Subscribe to the YouTube channel and the newsletter.
I’ll talk to you soon take care want help solving your company’s data analytics and digital marketing problems.
This is Trust insights.ai today and let us know how we can help you
You might also enjoy:
- Marketing Data Science: Introduction to Data Blending
- The Biggest Mistake in Marketing Data
- How to Set Your Public Speaking Fee
- Best Practices for Public Speaking Pages
- Understand the Meaning of Metrics
Want to read more like this from Christopher Penn? Get updates here:
Get your copy of AI For Marketers