Warning: this content is older than 365 days. It may be out of date and no longer relevant.

#WinWithAI_ The Foundation of AI is Great Data

In today’s episode, we review the 4 main challenges facing enterprises with data:

  • Volume
  • Veracity
  • Variety
  • Velocity

AI is built on data; poor data leads to poor machine models, which leads to poor outcomes. What’s the solution? The Trust Insights 6C Framework for Data is what every company must implement to prepare for AI:

  • Clean
  • Complete
  • Comprehensive
  • Chosen
  • Credible
  • Calculable

Watch the video for full details and explanation.

Register for the IBM Win With AI Summit in NYC here.

FTC Disclosure: I am an IBM Champion and am compensated by IBM to support and promote IBM events such as the Win With AI Summit.

#WinWithAI: The Foundation of AI is Great Data

Can’t see anything? Watch it on YouTube here.

Listen to the audio here:

Download the MP3 audio here.

Machine-Generated Transcript

What follows is an AI-generated transcript. The transcript may contain errors and is not a substitute for watching the video.

In today’s episode as part of the wind with AI series that I’m doing with IBM for the win with AI summit disclosure, I am paid compensated for participation. And today’s question is how can businesses use data today to develop a winning strategy for AI over the next five years? Well,

no one has any has a Foggiest clue what AI is going to look like in five years? Anyone who says they do is probably smoking the good stuff. Now you can say what’s been developed today and how will it be deployed over the next five years? Because that’s a valid question today. All these techniques in deep learning and reinforcement learning and things like Watson studio, for example. These are technologies that are available in market now and it is going to take companies one to 510 years to deploy them in market.

But where the technology is going, nobody has any idea. So let’s tackle this from the perspective of what do companies need to do with their data today, in order to be able to make use of these technologies and eventually roll this out over the next five years.

Data is the foundation of AI. Unlike traditional software, ai begins with data. So in traditional software, we write the code, you know, it’s a word processor, or whatever, we have the application on that phone or on our laptop, and, and and the software is made. And then we use the software and it spits out data, right? You have a word processing software, you type a letter and and the data is what comes out of it or the spreadsheet or the slide presentation or the video that’s the data software begets data in traditional software in AI data begets the software so we take data, a lot of it enormous quantities of it, and we give it to machines and they learn from it and then create

Create models and outcomes. And so that’s what’s so different about AI is that instead of explicitly writing the code for a prediction or whatever the software learns from the data, we feed it, which means that the data we give machines has to be impeccable. It has to be unimpeachable or as close to as we can get now, there are four main problems with data today. IBM calls these the four V’s the veracity, volume, velocity and variety veracity is the truthfulness of the data. how clean is the data if the data is has questionable issues or is missing pieces, you’re not gonna be able to make good predictions. You’re not gonna be able to train machines on volume is a problem that most companies have. They have too much data and the velocity the speed at which the data arrives every day is also another problem. And finally, the variety of data with unstructured data, video images, audio lots of texts.

Speech applications, IoT, you name it, all these different things are creating a variety of problem. So how do we solve for these problems? We use what at Trust Insights we call the six see framework of useful data. Data needs six characteristics in order to be useful, especially for machine learning and artificial intelligence. And those are the data I vaults bring up the framework here. There you go.

data needs to be clean. So it has to be prepared well, and free of errors. And you can use machine learning to solve some of that. But there are limits after which you start running at the validity error. So yeah, the data should be as clean as possible to start with. So if a company does not have clean data, now be the time to start that the data has to be complete without missing chunks.

Again, there are ways to solve for missing data with machine learning, particularly with a technique called amputation, but it’s not as good as having the actual data so

You’ve got missing chunks of data you have you’re gonna have a trouble working with AI data has to be comprehensive, meaning it must cover the questions being asked of it. So if our data if we want to know what is what causes particular type of cancer, for example, and we don’t have all the oncology data, we only have an hour subset. We don’t have like blood tests, we lack other environmental variables, we’re not going to come up with the right answer, we’re going to come up with at best a partial answer, and at worst, a very misleading answer. So data has to be comprehensive data has to be chosen well with few irrelevant or confusing variables. And so this is a lot of selection. And this is where a data scientist spend a good chunk of their time. And these first these six steps take up a data scientist Tyler 80% of their time just to just to get the data in working condition. Choosing variables requires things like principal component analysis and

reduction of dimensions dimension.

ality reduction in order to figure out okay out of the 400 variables we have which ones actually matter

and data has to be credible which means it must be collected in a valid way

this is an area where machine learning is not much help because this deals with things like biases in our data how biased is our data you may be using machine learning to do a natural language processing on social media data, let’s say on tweets.

But if you don’t understand that Twitter itself is a has a bias in the network that there are demographics at play that there are socio economic factors at play that give a very specific

weight towards certain opinions.

You don’t know that unless you have experience with understanding bias. And so this is a critical part of artificial intelligence and and companies working with data. You’ve got to know that your data is credible. You’ve got to know how it was collected. You’ve got to know the bias.

seas of the collection process. Imagine at a corporation, you send out a survey to customers and say, you know, how awesome was your scores or customer service? Well, that question by very definition is leading he’s a leading question instead of asking how was your customer service, right? So the credibility of collection is one of those important factors in building good data, especially for AI. And finally, date of all did has to be calculable. This is less of a problem for AI than it is for humans. Humans are really bad at dealing with massive data sets, machines are much better at it. But a machine still has to be able to work with the data, it still has to be compatible in a, you know, whatever formats are needed. So you may have a bunch of structured data and then a whole bunch of unstructured data. You’ve got to be able to have machine learning processes that that transform the unstructured data into something structured in order to be able to do things like prediction. So this framework

Work, the 60 framework is the foundation for what companies need to do to prepare for AI. If you don’t have data that are has been through this process, and you’ve addressed all six of these issues, all six of these features, then your AI efforts at best, you’re going to be limited and at worst are going to be highly misleading. So we need to fix this first within your company. Now, if you’d also so this is the data portion data comes from people process and platform, they all the good. The good old Lovett framework

if you don’t have people who are committed to a great data, if you don’t have processes that enable the 60s and you don’t have technology that allows you to to store it, transform it, work with it, then AI is just not for you a special especially the people hard if you have people who are committed to opposing the use of data AI is definitely not going to work for you and if you think about it

Who might that be? Well, there are folks who work at some companies who don’t really want to light shine on their work, because their work may not be very good, or their work may have ethical issues and things like that. So navigating great data and setting the stage for AI requires more than just technology requires great, a great platform. And obviously the Watson studio ecosystem is a great platform for that. But it also requires great processes internally, and a commitment from the people who work at the company to the AI initiatives. So that’s the very long answer to the question. It’s a great question about how to prepare for AI what we need to do with our data. As always, please subscribe to the YouTube channel and the newsletter I’ll talk to you soon. Take care

if you want help with your company’s data and analytics visit Trust Insights calm today and let us know how we can help you


You might also enjoy:


Want to read more like this from Christopher Penn? Get updates here:

subscribe to my newsletter here


AI for Marketers Book
Take my Generative AI for Marketers course!

Analytics for Marketers Discussion Group
Join my Analytics for Marketers Slack Group!