You Ask, I Answer: Data Democratization and AI?

Warning: this content is older than 365 days. It may be out of date and no longer relevant.

You Ask, I Answer: Data Democratization and AI?

Jim asks, “I am skeptical of data democratization because the average decision maker does not understand data collection, transformation, integration etc. Doesn’t AI make this an even bigger problem?”

It depends on how abstracted the decision-maker is. Certainly the pandemic has shown us the general population is completely incapable of parsing even basic scientific data, like why you should wear a mask. So the question is, who’s working on the AI?

If AI systems are left in the hands of legitimate experts, it could potentially improve things. For example, IBM Watson Studio has bias detection built in at multiple levels, so the tooling can potentially improve our work – or at least make it easier to audit. On the other hand, if you have the 6-week crash course folks building models, then yes, it could make things much worse.

You Ask, I Answer: Data Democratization and AI?

Watch this video on YouTube.

Can’t see anything? Watch it on YouTube here.

Listen to the audio here:

Download the MP3 audio here.

Machine-Generated Transcript

What follows is an AI-generated transcript. The transcript may contain errors and is not a substitute for watching the video.

In today’s episode, Jim asks, I am skeptical of data democratization because the average decision maker does not understand data collection, transformation, integration, etc.

Doesn’t AI make this an even bigger problem? So, let’s first quickly define data.

democratization is the ability for anybody to be able to work with data hence the term democratization.

Just like new media, a podcasting and blogging was the democratization of media the ability for anybody to make a podcast anybody to make a blog as such.

And data democratization has been something that has been on technology companies radar for a really long time.

My my whole start working with IBM, years ago, was around Watson Analytics, the idea of the citizen analysts as the average person who could pick up some good tooling data set and generate some usable insights didn’t work out so well.

And to Jim’s point, the reason it didn’t work out so well is because data analysis even though it sounds simple, isn’t there are a lot of pitfalls.

There are a lot of gotchas.

There are a lot of things that are not taught well and poorly emphasized when people start working with data things like margins of error and statistical relevance, statistical validity.

All these things are things that in a laypersons experimentation with with data and math and, and statistics and such, don’t really get covered.

People have a tendency to approach things in a very naive fashion which is normal.

And they pick up a tool they run a basic analysis they haha I’ve got the answer.

You can You can tell how expert somebody is in the realms of mathematics and statistics by how many shades of grey? their answers come with, right? The beginning person is like, this is the answer.

You know, the extra person is like, well, there’s a whole bunch of reasons why this may or may not be the answer.

And we’re not really sure.

We can give you probabilities, we can give you a sense of reliability or not in the data, but for the most part, it’s not as cut and dried as you think.

Right.

And those are the answers people hate.

People hate those answers.

And that’s why data democratization hasn’t worked out so well, because there are a whole bunch of people who want the answer.

And the answer doesn’t exist, there could be a range of answers.

So does AI make this worse? Well, it depends on how abstracted the decision maker or the the AI tool user is from the tools themselves.

Certainly, the pandemic In particular, has shown us that the general population is completely incapable of parsing even basic scientific data like why to wear a mask? Sorry.

It’s true.

People couldn’t get understand even the most basic scientific facts and data points and make good decisions from them.

So the question then is, does AI make this worse or an even bigger problem? It depends on who’s working on the AI.

It depends on who’s working on the models.

If you hand somebody a piece of fully polished software, a model of some kind of, they can’t really tinker with the innards.

They can only use it for its intended purpose.

The likelihood that it goes off the rails is lower.

If it’s good software than somebody say, picking up Python and just trying to hand it copying and pasting the code randomly from Stack Exchange.

If AI is left in the hands of legitimate experts, it could potentially improve things IBM Watson Studio is introducing bias detection at multiple levels in the process from data intake, to model to model drift, right? And it puts a big old warning saying like, Hey, your models drifted more than 6%, or Hey, your models drifting on this protected class, I should probably not do that.

And so in those instances where the the, the person who’s working on the system has to stay within the boundaries of a well trained system and has to obey the warnings that it gives, yes, ai could potentially improve our work and potentially reduce some of the problems that come with data democratization.

On the other hand, if you get that, you know, six week Crash Course a person who, you know, took the six week crash course in AI certificate.

Yeah, I could make things a lot worse because that person doesn’t have the background in data science doesn’t have the background in stats and probability.

Probably It is a generalization, but probably doesn’t have that level of background and that level of experience of just having models go off the rails.

And without a mentor without somebody more experienced to guide them, it could make things a lot worse I was having a conversation with a of a founder of a startup a few weeks ago, was talking about how there’s got all these sophisticated models are working on building and you know, you had a friend of his from college, they just both graduate from college, you know, they’re gonna engineering these models and have some experience in it.

And I said, Okay, tell me about how you’re doing bias detection.

Tell me about who on the team has the most experience with ethics and data ethics and he’s like, a, like, Okay, so this is gonna go off the rails pretty quick.

I said that you need to be building into your product.

Things like monitoring for model drift.

Things like looking for ethical problems, things like that would you know, fail you on a basic ethics audit.

This was news to the person.

So in that instance, where you have a bunch of people who are inexperienced with AI, trying to deploy it, yes, AI is going to make those problems of data democratization even worse, because these are people who don’t know what they’re doing.

On the other hand, you get somebody who has, you know, 510 15 years of experience.

In working with datasets, knowing when a data set is imbalanced knowing when a p value has gone off the rails knowing how to do a two tailed t test.

In those cases, the person who’s building the system will probably do okay, and will make a system that is hard for other people to screw up.

So it really comes down to which is going to deliver a good outcome a good long term outcome.

One of the dangers In AI, particularly round bias is that a system with no constraints will perform better, it will do things that you do not want it doing well.

It’ll behave in unethical ways.

But it will, in the short term deliver better results.

a longer term assistant built for the long term will make trade offs and say like, yeah, we don’t want bias on gender identity, we don’t want bias on race.

And the system will have to sacrifice some levels of performance, the model’s ability to generate the top nine performance in order to meet those competing objectives.

But that’s exactly the kind of caution and care and attention to detail that you want.

So will AI make data democratization worse, potentially? Will it make it better potentially all depends on who’s steering the ship? How do we help people steer the ship better? give them that list of questions the pickup Dr.

Hilary Mason’s free book Ethics and data science has got a ton of great checklists in it about questions you should ask before starting any project with data, ethical questions, process questions, accountability questions, those will help make all of us better data scientists, better data analysts better AI engineers.

And if you work in a company and you have a leadership role, hold your company accountable to a lot of those those benchmarks, say we’re going to adhere to these basic processes, so that we generate outcomes that will not get a sued, or get us failing an audit of some kind.

So, really good question.

We could spend a lot of time on this.

If you have follow up questions, leave it in the comments box below.

Subscribe to the YouTube channel newsletter, I’ll talk to you soon.

Take care want help solving your company’s data analytics and digital marketing problems.

This is Trust insights.ai today and let us know how we can help you

Machine-Generated Transcript

Comments

Leave a Reply Cancel reply

Pin It on Pinterest