You Ask, I Answer: Retrieval Augmented Generation vs Fine-Tuning?

In today’s episode, Jay seeks clarity on the differences between retrieval-augmented generation and fine-tuning in language models. You’ll learn how these techniques compare and contrast, each playing a unique role in enhancing AI’s capabilities. Discover the metaphor of ‘recipes versus ingredients’ to understand how fine-tuning and retrieval-augmented generation can improve your AI’s performance. Tune in for this technical yet accessible breakdown to elevate your understanding of AI model optimization.

You Ask, I Answer: Retrieval Augmented Generation vs Fine-Tuning?

Can’t see anything? Watch it on YouTube here.

Listen to the audio here:

Download the MP3 audio here.

Machine-Generated Transcript

What follows is an AI-generated transcript. The transcript may contain errors and is not a substitute for watching the video.

In today’s episode, Jay asks, I’m a little bit confused.

You’ve mentioned different ways of manipulating language models to work better, like retrieval, augmented generation and fine tuning.

What is the difference? Okay, this is a really good question because you’ll hear these terms a lot in language models, but it’s not clear to the end user what they actually do.

So let’s start with language models in general.

A language model comes in three flavors.

There’s sort of a foundation model, a supervised fine tuned model or called an instruct model, and then a reinforcement learning with human feedback model called a chat model, typically.

So you will see if you go on to hugging face, for example, foundation model, instruct model, chat model as sort of the variants of different language models.

Each model gets progressively more complex and sophisticated.

So a foundation model really is not all that useful.

It has a lot of the data in it, but it’s not ready for use.

It’s not ready to to be able to answer questions.

All it does is.

Predictions and not necessarily very well, an instruct model that can take a direction, take an instruction and execute on it is where most of us are would start to see some value.

And the way you make an instruct model is you give a model a gazillion instructions and appropriate responses.

And you have the model learn from that library of, hey, if this, then that, if you if someone asks you this, do this.

If someone asks, this is the correct answer.

Who is president of the United States in 1776? George Washington, et cetera.

The supervised, fine tuned instruct models are the first models that are very capable of doing specific tasks.

And then you have reinforcement learning with human feedback.

This is where models have chats and they can have conversations.

And that conversational data becomes part of the model and becomes more sophisticated.

It can anticipate and have natural language conversations while still being able to carry out instructions.

So that’s how these models work now when you’re doing fine tuning, what you are essentially doing is you are giving new instructions to the model through plenty of examples and saying you’re going to behave more like this.

So, for example, if you have a model that maybe spits out obscenities every so often, you would give it tens of thousands of questions and answers, none of which contain obscenities.

And what that the model will learn from that, those examples is it will deprioritize obscenities and say, Hey, that’s weird.

I’ve been given all these new examples and none of them are swearing, so maybe I should swear less too.

Now, it doesn’t actually say it’s not conscious, but that’s what’s going on underneath the hood.

So fine tuning is all about giving models new instructions or changing the nature of the instructions that they can interpret and what the ideal outputs are.

When we build models, when companies build models, they are built using enormous amounts of text corpuses like Common Crawl or Archive or Stack Exchange or Reddit.

Or the the CC Books Archive, Project Gutenberg.

All of these are data sources that go into the model and get turned into statistical representations of the relationships among words.

It’s critical to say that in a foundation model or any language model, the actual works that was trained on are not in there.

What is in there is a statistical set of relationships of what is the what are the words that are most closely related to this word? So if I say the word tuna, what are the the other words that would be associated with it? This is a technique called embeddings, and we’re not going to get into the vector space and all that stuff.

But think of it conceptually like a word cloud, a really big word cloud.

What are all the words that would be related to the word tuna so that when you prompt a model, it can answer? These models are trained on a lot of generic data, right? All across the Internet.

That’s why a tool like ChatGPT can be so good at what it does, because it’s been trained on examples from virtually every domain of knowledge to some degree.

There’s some things that are highly specialized that it doesn’t know because there’s just not enough examples, but it’s seen most things.

Most of the big language models today, even the open weights models like the llama family, the Mistral family have still seen at least some representation of most subjects, even if it’s not a lot.

However, if you have access to data that is not public, that was not part of the training data or data that’s new and fresh, you might want to add that context, that extra information to a model, and that’s called retrieval augmented generation.

You provide a database of new statistical relationships of things that the model hasn’t seen before, and it knows to go to that database first, check what’s in there, and then if it doesn’t, it can fall back on its additional knowledge.

The difference between fine tuning and retrieval augmented generation is the difference between recipes and ingredients.

When you fine tune a model, you are saying, hey, the recipes you have are not great, they’re not focused enough.

Let’s let’s rip out the section of the cookbook and put a new section in.

Let’s add more recipes for how to cook Vietnamese cuisine.

Fine tuning a model doesn’t add new data to it.

It doesn’t add new information.

What it does is it helps the model answer certain types of questions better by giving it many more examples of those questions and changing the internal weights of the model.

The internal probability that it will respond in a certain way.

So it’s like giving a model better recipes.

Let’s give the more clear directions.

Let’s give more recipes of a certain type.

You’re not changing the ingredients that a model has access to.

You’re just giving it better recipes.

Retrieval augmented generation is when you’re saying, hey, model, you’re very capable of a lot of things, but there’s some stuff you just don’t have.

So let me give you that stuff.

It’s like giving a kitchen and a chef a bigger pantry with more and different ingredients like, hey, here’s some new ingredients for you to work with.

The chef doesn’t necessarily change how they cook, but they do have access to more ingredients or better ingredients, better quality ingredients than what they’ve got.

And so you’ll see these two techniques mentioned a lot in language models.

However, they are they are they serve different purposes.

If you’ve got a language model is not cooperating, it’s not doing what’s told.

It needs more fine tuning.

It needs better recipes.

If you’ve got a language model that follows directions well, but it just doesn’t know some things, you need retrieval, augmented generation, you need better ingredients or more ingredients so that it can carry out the tasks that you’ve asked it to do.

Sometimes models need both.

Sometimes models need to be told what to do better and to get a new access store of data.

Or if you’re trying to make a model perform a new set of specific tasks, you might have to, like you would in the kitchen, give a new recipe and new ingredients at the same time for it to succeed, even though the chef may be very capable in other areas.

So that’s the difference between these two techniques.

And it’s important to know this difference so that if you’re faced with a situation where you’re not sure why this model is not behaving or this the software is not doing what it’s told, you know what to ask for.

You need you know, you can say, I need better recipes.

This model is not following directions or we need new ingredients.

This model just doesn’t have enough to work with to answer the questions with the level of specificity that we want.

So really good question.

It’s kind of a technical answer, but conceptually it should make sense.

Recipes versus ingredients, fine tuning versus retrieval, augmented generation.

Thanks for tuning in.

Talk to you on the next time.

If you enjoyed this video, please hit the like button.

Subscribe to my channel if you haven’t already.

And if you want to know when new videos are available, hit the bell button to be notified as soon as new content is live.

♪ ♪


You might also enjoy:


Want to read more like this from Christopher Penn? Get updates here:

subscribe to my newsletter here


AI for Marketers Book
Get your copy of AI For Marketers

Analytics for Marketers Discussion Group
Join my Analytics for Marketers Slack Group!