Mind Readings: How Large Language Models Really Work

In today’s episode, we’ll dive into how AI language models actually work. You’ll gain a deeper understanding of the decision-making process behind these powerful tools. You’ll learn how to improve your prompts to get the results you want. And you’ll discover why these models sometimes deliver unexpected outputs.

Mind Readings: How Large Language Models Really Work

Watch this video on YouTube.

Can’t see anything? Watch it on YouTube here.

Listen to the audio here:

Download the MP3 audio here.

Machine-Generated Transcript

What follows is an AI-generated transcript. The transcript may contain errors and is not a substitute for watching the video.

Christopher Penn: In today’s episode, let’s talk about how language models work with a different explanation.

Now, the way that I’ve typically explained this in the past, I do this in my keynotes is think about a prompt when you’re prompting a tool like chat GPT, or Gemini or Claude or any of the tools that are out there as word clouds, right.

And as you type words into your prompts, word clouds are appearing behind the scenes.

And the intersection of those words is what the machine knows to spit out conceptually, that’s more or less how they work.

Mathematically, that’s not completely wrong.

So I’ve been looking for a better explanation that is more aligned with the mathematics of how these things work.

And here’s what I’ve come up with.

Have you ever read as a kid or maybe an adult, the choose your own adventure books, right, we open this book, and it’s got a starting page of story.

And the bottom of each page says, you know, turn to page 41, if you choose the red button, or, you know, turn to page 43, if you choose the blue pill.

That is a really good example of how generative AI models work of language models.

You keep reading and you choose the next page, make a decision, you choose the next page, and you’re hopping around this book.

And eventually, you get the story told you want.

Except that instead of reading a few paragraphs, then turning to the appropriate page to continue the story, a language model is choosing how the story continues after every single word.

And the book, is massive.

The book is is as big as the English language, right? It’s it’s terabytes of data.

And every word has a choice at the end for the for what the next word is going to be.

Why this explanation is better is because like a choose your own adventure book, a language model keeps track of the story that’s already been told, right? It doesn’t go backwards and make different choices.

It says, Okay, well, you chose this word.

So the next set of probabilities are this.

When you’re reading a choose your own adventure story, you keep reading and you keep following these threads throughout the book, there aren’t an infinite number of choices at the bottom of every page.

When you read a choose your own adventure book, there’s a handful right? In the same way, when a language model is picking the next word to come up with, there’s also not an infinite number of choices.

At the bottom of every page, if you will, as it as it predicts as it reads, there’s a handful of words that are most probable based on the story so far.

That’s the critical point.

Because a language model can keep track of what’s been written so far, it uses everything that’s been written so far to predict the next word.

Right? Suppose the story in AI is processing has read the following words.

You know, if you’re if you’re American, I pledge allegiance to the right, what’s the next most likely word it will choose as it pursues its word by word choose your own adventure, probably the word flag, right? Because in American English, it’s very common to hear people say I pledge allegiance to the flag.

If you’re, if you’re English, you’ll say God save the whatever the next word is, it could be king or queen, depending on how old you are, and what’s going on.

But it’s probably not rutabaga.

In either example, right, the next word is probably not rutabaga.

Statistically, it’s unlikely to be that.

And so a language model makes its choice based on probabilities based on the number of previous things that’s read in its training data, where flag is is probably going to be the next word.

That’s a really important thing to understand.

Because when we prompt these tools, we are giving them some words to start with, we’re giving them the first page of the story.

And then they have to from the words that we provided, read that guess the next word.

And if it does a bunch of guesses, and we like what it says, we like, you know, it’s, it wrote us a blog post or whatever, then it will continue to predict based on those choices.

And it never goes back and changes things in the past, but it uses all the past to help decide what the next word is going to be what what page is going to turn to next.

This is why models go off the rails sometimes, right? When you’re using a tool like chat GPT, and it starts spitting out nonsense, or it writes really badly, all of a sudden, they go awry when they have not read enough story to choose the next word sensibly.

Imagine, imagine you were reading a choose your own adventure book, and the first page of the book has one word answers today, you’re like, there’s a bunch of choices, you know, she paid turn the page 82.

If you want this, it says today on the page, how am I supposed to know what’s what to choose for the next page, you’d have a nearly limitless choices.

Even if you knew that you wanted a romance story or thriller story, it’s it’s still just too vague.

That’s what happens when a model runs off the rails, it gets some word that doesn’t have enough words to make a decision or it’s got conflicting words.

It’s like I don’t know what to choose next.

I’m just gonna pick a random word or a word that matches the what I know statistically, even if it doesn’t make coherent sense.

This is why prompt engineering with detailed prompts is so important.

Because what you want to do is you want to give the model enough of the story so far, so that the next part of the story as it chooses the next page will be much more sensible, right? If you give it a prompt, like write a blog post about B2B marketing, and then you’re really unhappy with the generic swill that it comes up with, it’s because you didn’t give it enough story.

So it’s like, okay, I’ll just I’ll just pick something that’s that seems Christopher Penn: sensible.

If you give it a three or four paragraph prompt about the story so far B2B marketing is this and this is the things that care about and don’t mention this because we already know this, and so on and so forth.

You will have it create better content because there’s fewer choices behind the scenes for what page it’s going to go to next what page is going to turn to next.

That’s how these things work.

And it’s if you understand this, you will get better results, I promise you, you will get better results, the more relevant words you use, the better these tools will perform for you.

So that’s going to do it for today’s episode.

Thanks for tuning in.

I’ll talk to you soon.

If you enjoyed this video, please hit the like button, subscribe to my channel if you haven’t already.

And if you want to know when new videos are available, hit the bell button to be notified as soon as new content is live.

♪ ♪

Mind Readings: How Large Language Models Really Work

Machine-Generated Transcript

Comments

Leave a Reply Cancel reply

Pin It on Pinterest