Mind Readings: AI Model Scale, Scope, and Capability

In today’s episode, we’re diving into the relationship between AI model size and its ability to tackle complex challenges. You’ll learn how the latest advancements in AI technology could change what tasks machines can perform. If you work with AI or are curious about its growing capabilities, this is a must-watch!

Mind Readings: AI Model Scale, Scope, and Capability

Watch this video on YouTube.

Can’t see anything? Watch it on YouTube here.

Listen to the audio here:

Download the MP3 audio here.

Machine-Generated Transcript

What follows is an AI-generated transcript. The transcript may contain errors and is not a substitute for watching the video.

Christopher Penn: In today’s episode, let’s talk about an academic paper that my friend and sent me.

And this is a fascinating paper.

The paper’s title is called Do efficient transformers really save computation by Kai Yang, Jan Ackerman, et al.

And this is from February 2024.

This is a preprint.

So it has not been pre it has not been peer reviewed yet.

But one of the things this paper talks about is how efficient transformers are, and the trade off of efficiency versus problem solving capabilities.

The net of the paper is this model sizes, and problem, problem solving capabilities are essentially a correlated relationship, they go hand in hand, the bigger the model’s capabilities, the more complex problems that can solve the smaller its size, the fewer capabilities it can bring to the table.

This, you would think this is not something that needs to be studied.

But it does because there are always weird twists and turns when it comes to computer science, particularly artificial intelligence.

So it’s good to have this confirmation of how model size affects its problem solving capabilities.

Where this gets interesting.

It’s not in the paper itself.

But model size, one of the proxies for understanding model size and model capability is something called context window.

context window is essentially the short term memory of a generative model of a large language model.

Early models like GPT two, which came out in 2020, had a context window of 1024 tokens.

This is approximately 700 words, you could you could have it do some writing, but it’s short term memory was the last 700 words in the interaction with the model and everything that happened after that early stuff just gets forgotten.

The free version of chat GPT today can handle, you know, several 1000 words at a time.

But and you’ve probably noticed if you use the free version of many models, they get forgetful real fast, because their context windows are so small.

And that that’s obviously because they want you to pay for the bigger models.

The bigger models today, the paid ones have context windows and like the 25,000 word range where now you’re talking lengthy conversations, right? If you think about it, this book here, and Hanley’s everybody writes is 75,000 words.

So a third of it can fit in today’s models pretty comfortably, at least, you know, things like llama two, etc.

The paid versions of chat GPT of Claude of Google’s Gemini, they can handle substantially more GPT.

For the current version, turbo can handle 90,000 words.

So the entire book can go into working memory now, Claude three opus, which just came out not too long ago, Google Gemini 1.5, which is around the corner, and presumably GPT, either 4.5 or five will have context windows in the million token range or about 700,000 words.

What this means, in terms of their short term memory is that the bigger their short term memory, the more complex problems they can solve, because you need to load that much more information about complex tasks so that they have time and space to think and not lose track of what they were thinking about not lose their train of thought, just like you and me.

If we have a particularly complex problem, we have to break it down into pieces, try and solve the individual pieces and glue it back together.

We cannot, except for some savants, we cannot, for the most part, do it all in our heads immediately.

Someone gives you a fluid dynamics equation, you have to write that stuff down and work through it step by step to get to an answer if you want the answer to be even remotely correct.

So we need that space mentally.

AI models do to AI models need that space to be able to process to be able to think.

And the more space they have in their short term memory, the better they perform, and the more complex tasks they can perform.

What this means is we are probably within the calendar year, it’s 2024.

As I record this within the calendar year, we’re probably going to look at models that have that million token memory.

So we’re going to go from, you know, 90,000 words in a in a working memory to 700,000 words within the year.

That adding a zero, and then some to their capabilities means you add a zero to their their kinds of problems they can address, because they can remember more.

And they get they have that ability to, to deal with more complex problems to take more time, more space to solve problems.

What does that mean? If we talk about AI and the future of work, that means that more tasks are on the table.

Combined with agent networks, which is essentially models working together to solve more complex problems than any one individual model can do by itself, fact checking each other, agent networks in very large context model windows will mean more tasks that are traditionally done by humans today, will be capable of being done by machines by the end of this year.

So if you think about the challenges say of arguing a court case, how much case law you’d have to have in working memory to be able to successfully argue well, today, 90,000 words, you know, it’s a decent sized book.

How much case law 700,000 words and can a machine process that effectively, they soon will be able to and so more tasks say in that legal domain will be available for machines to help solve.

So this paper, I think it’s worth reading if you enjoy the mathematics of it, and you want to get a sense of what it was that the researchers were testing.

But the key takeaway is that model size correlates with problem solving ability and we are about to have a very large jump in problem solving ability very soon.

And so we need to be prepared for the implications of that, and what it means and how it’s going to impact our use of generative AI, and how generative AI will impact our interactions with it and what it can do that we don’t need to do anymore.

So that’s the show for today.

Thanks for tuning in.

Talk to you soon.

If you enjoyed this video, please hit the like button.

Subscribe to my channel if you haven’t already.

And if you want to know when new videos are available, hit the bell button to be notified as soon as new content is live.

♪ ♪

Mind Readings: AI Model Scale, Scope, and Capability

Machine-Generated Transcript

Comments

Leave a Reply Cancel reply

Pin It on Pinterest