You Ask, I Answer: Managing Rectangular Data with Generative AI?

In today’s episode, I tackle how to use AI with structured, tabular data. While generative AI struggles with graphs and images, it can write custom Python code to process databases or spreadsheets. By providing prompts about your goals, these tools create tailored data extraction and analysis scripts. Join me as I demo generating code for statistical techniques like lasso regression to unlock insights from rectangular datasets.

You Ask, I Answer: Managing Rectangular Data with Generative AI?

Can’t see anything? Watch it on YouTube here.

Listen to the audio here:

Download the MP3 audio here.

Machine-Generated Transcript

What follows is an AI-generated transcript. The transcript may contain errors and is not a substitute for watching the video.

Today’s episode of you ask I answer was recorded in front of a live studio audience at the digital now conference in Denver, Colorado, in November 2023.

The session title was appropriately you ask I answer live generative AI q&a enjoy and is there anything that we is that are trying to be advocates for properties of it as is that identifying every article that has any inspiration that was from AI is that only create exclusive concepts and you’re using AI for idea generation given philosophies on how to prevent it on well, so I think the transparency is important, particularly for associations.

Transparency is important to say here’s what we are and are not doing with this the software.

Here’s what we’ve published, you know, here’s how it was made.

It’s kind of like you look at this thing here.

Right on here is a nutrition label says here’s what’s in this bag now it may or may not be true.

Like it says vegetable oil, that’s a bit big.

There’s a lot of vegetables.

But at least you know what’s in the bag more or less and you know what’s harmful.

And we’re starting to see some of that in AI with people labeling data sets like hey, here’s what’s in this data set.

When you publish information, you might want to think about like what is the nutrition label for this document even look like? Can we prove where it came from? Can we show an ingredients list? And if AI is part of that, that’s fine, right? Like no one dings a company for saying, hey, you used a blender for making this instead of you know, mixing it by hand.

But we understand that there’s these tools in supply chain.

So I’d say that’s an important part.

And then what we were talking about earlier about certification saying, as an association, you are in a unique position to say, when we publish this, we’ve certified that it’s true.

If there’s research or data coming from members, we are putting our stamp of approval saying we have vetted this, we have peer reviewed it, and this is true.

And other things you may see out there on the interwebs that may contradict that.

We haven’t vetted it, we haven’t processed it.

So we can’t say that this is true, especially around stuff like health.

There’s so much misinformation about health, like in food and nutrition, that if you’re an association in that space, you have the unique opportunity to say like, we will tell you what is and is not true.

In the back there.

Wait for the mic.

Sorry.

That’s an opportunity for us.

Yes.

Yes.

We’re reliable.

And we’re gonna be able to trust them.

Exactly.

And that’s why that disclosure and transparency about AI is so important so that your members continue to trust you.

When you publish AI generated content, you say this is generated by AI, but it’s been reviewed by one of our team.

And we can certify that this even though machine generated it’s still true.

Other questions? Dad jokes.

Here.

You talked this morning, you talked about extraction as a process that AI can assist with.

And I think mostly this like language, you know, like text, extracting key points, action items from text.

We’re an organization has lots of data, like structured data.

It seems like AI isn’t really the tool to use to understand like data that’s in the tabular format.

Or, but there are other tools that are developing that are more geared towards, you know, we’re interested in say, extracting data from like graph images, you know, like that.

And I just don’t know what the state of the art is in terms of those controls.

So for tabular data, you’re if you want to work with that data, your best bet, if assuming you don’t already have the tooling is actually working with GPT-4, particularly the advanced data analysis module, because what the tools do is they can write you code, right? So they can write you Python code that can process data for specific things.

So if I go in here, let’s go here and let’s start ourselves a new prompt.

You are a Python programming expert.

You know, NumPy, Pandas, data science, data extraction, data cleansing.

Your first task is to ingest data from a SQLite database, named Bob.

And the table name is members.

Write the appropriate code to extract the data, identify numeric columns, and produce a lasso regression of the churn column based on the churn column.

Now, this is completely fictitious, but what it’s going to start doing is essentially start writing you the code that you need to programmatically access that using Python in this case.

So if you have rectangular data, tabular data, and you want to extract insights from it, you may not necessarily be able to load it into one of these tools, but you can have them write you the tooling you need to then do those things, particularly if you know what you want, but you don’t know how to do it.

Like lasso regression and ridge regression, for example, are two classical methods for figuring out, hey, I’ve got a bunch of numbers and an outcome.

Of all these numbers I have, which one relates to the outcome best and gets rid of noise that we don’t need? Lasso regression is one of those techniques.

So you might say, I’ve got a lot of data and I’ve got an outcome I care about, but I don’t know how to figure out what’s real and what’s not.

The tool will eventually, when you chat with it, say, you know, these are some of your choices for regression with it that you can then take and try out on your data.

That’s how I tackle structured data.

For vision data, right now they all kind of suck.

They’re OK, but they have a very hard time, particularly with poorly made graphs, of extracting data out of those graphs because it’s the same problem you and I have.

You look at a graph that’s badly done, you’re like, I don’t know what that says, other than there’s a line that’s going up and to the right.

If you look at the graph and you can’t figure out what the data is, there’s a good chance the machine can’t either.

Wow, it’s really slow.

Other questions? I really like if you have people who can write code that can inspect the work and help get running, this is a phenomenal way to build tooling within your organization for those efficiencies because there’s things you do every month or every week or every day that are just repetitive.

You get a spreadsheet full of data and you’re like, I’ve got to copy and paste out this and this and this to make this PowerPoint.

You give that to the machine, you say, here’s what I need to get out, write me the code to access the spreadsheet and pull out these relevant data points and it will do that.

And then if your computer has Python installed on it, or you’ve got a server somewhere in your organization that has it on it, then you run that code against its spreadsheet every month and now you’re not spending an hour and a half copying and pasting anymore.

Now you just run the code and you get on with your day.

There’s lots and lots of those little wins throughout everyone’s workday that the challenge is not the technology, the challenge is knowing to even ask the question, Hey, can I get a machine to do this? Like this seems like an easy thing.

Can I get a machine to do this? The answer is usually is yes.

If you enjoyed this video, please hit the like button, subscribe to my channel if you haven’t already.

And if you want to know when new videos are available, hit the bell button to be notified as soon as new content is live.

[MUSIC PLAYING]


You might also enjoy:


Want to read more like this from Christopher Penn? Get updates here:

subscribe to my newsletter here


AI for Marketers Book
Get your copy of AI For Marketers

Analytics for Marketers Discussion Group
Join my Analytics for Marketers Slack Group!