Mind Readings: AI and Government Data

In today’s episode, we explore the transformative potential of AI in making complex government data accessible and useful. You’ll learn about the challenges of working with government-published data and how generative AI, like large language models, can revolutionize this process. Discover how AI can convert poorly formatted governmental records into valuable, analyzable data, opening up new possibilities for political engagement and advocacy. Tune in to unlock the secrets of utilizing AI for impactful social change.

Mind Readings: AI and Government Data

Can’t see anything? Watch it on YouTube here.

Listen to the audio here:

Download the MP3 audio here.

Machine-Generated Transcript

What follows is an AI-generated transcript. The transcript may contain errors and is not a substitute for watching the video.

In today’s episode, let’s talk about uses for AI that people maybe are not thinking about that could be very, very helpful and useful.

One of the most most challenging data sources to work with is anything published by a government governments in general have varying degrees of transparency.

But the formats they publish data in very often are not super helpful.

For example, in the city that I live in the the police department publishes daily logs.

These daily logs are incident reports of what happened where when how many officers responded and things like that useful data.

And they’re doing so as part of a transparency initiative to help citizens feel like they know what law enforcement is doing.

And this is a good thing.

This is they’re doing the right thing.

But their logs are in a really, really annoying format.

The logs come every day as PDF files.

else, anywhere from one to 10 pages of PDFs.

And they’re formatted.

I struggle to explain what the format is.

It’s like sort of a spreadsheet dumped onto a PDF, but not really.

I suspect very strongly that the format is made by some probably fairly old, unique vendor in the law enforcement space, whose software, frankly, is really an incentive to make it easy to use for the average citizen.

Not in any conspiracy theory kind of way, just that’s, they just dump the records out onto a sheet of paper, and then presumably somebody reads through that that paper.

In fact, it wouldn’t surprise me if these formats were derived from, you know, paper, paper formats, paper reports that people used to make in the times before the internet and stuff like that.

If you wanted to make use of this police data for mapping for statistical analysis, prior to the advent of language models, you would have to sit there and manually key in or use some kind of OCR software to process all those logs.

And that would be both expensive and really, really boring.

With the advent of generative AI and large language models with in particular, you can now take those logs, give it a moderately sophisticated prompt saying here’s what to look for, here’s how you’re going to interpret this information.

And it’ll read them, it’ll read them, and it’ll extract the data.

And then you can say to the language model, I want this data in CSV format or direct to a SQL database.

And it’ll do that.

How much information is locked away in arcane governmental formats that were written in the days before before the internet was really a thing.

Another one in the United States, we have a federal agency called the Federal Elections Commission.

One of the things they do is they publish, they publish funding logs.

So they tell you who has donated to which campaigns.

These are in a really bizarre kind of dumb space delimited format with fixed character with columns, which is just about the worst way you can possibly publish data because it’s very difficult to interpret, it’s very difficult to inject.

Something like a comma separated value table is much easier to ingest.

This is a result of their software, essentially not really changing much since the early mainframes that was written for.

And so when they publish the information, which they’re doing correctly, that information, either you have to process it manually as is, or you can pay vendors exorbitant sums of money every month to to work with that information.

There are in fact, a number of vendors in the election space that can process that data and provide it to you in a CSV format.

Well, that was then now is now generative AI can do that generative AI can take those logs that those databases are very, very poorly formatted data, and transform them into useful data, transform them into data that you can analyze, you can feed to other pieces of software.

The point of all this is that if you have an idea, if you have something that you want government data for, and up until now, that government data has been inaccessible, not because the government’s keeping it from you just because it’s in a poor format.

That’s less of an obstacle today.

Using tools like chat GPT, for example, or miss straws, mixed all model or any of the generative AI products that are out there.

You can now use language models to interpret the data, track the data and make it useful to you.

So if there are particular causes that you care about, if there are particular political positions, if there are elections and races that you care about, that there’s data available, but not in a useful format, partner up with generative AI, unlock the value of that data and start making the changes that you want to see in the world.

That’s gonna do it for this episode.

Talk to you next time.

If you enjoy this video, please hit the like button.

Subscribe to my channel if you haven’t already.

And if you want to know when new videos are available, hit the bell button to be notified as soon as new content is live.

♪ ♪


You might also enjoy:


Want to read more like this from Christopher Penn? Get updates here:

subscribe to my newsletter here


AI for Marketers Book
Get your copy of AI For Marketers

Analytics for Marketers Discussion Group
Join my Analytics for Marketers Slack Group!