Warning: this content is older than 365 days. It may be out of date and no longer relevant.

Almost Timely News: Improve ChatGPT Performance By Understanding How It Works (2023-02-26) :: View in Browser

Almost Timely News

👉 Take my new free course on how to improve your LinkedIn profile and make yourself more appealing to hiring companies ➡️

Watch This Newsletter On YouTube 📺

Almost Timely News: Improve ChatGPT Performance By Understanding How It Works (2023-02-26)

Click here for the video 📺 version of this newsletter on YouTube »

Click here for an MP3 audio 🎧 only version »

What’s On My Mind: Improve ChatGPT Performance By Understanding How It Works

Let’s take some time to deconstruct the architecture of a large language model like InstructGPT/GPT-3. These models, which power useful tools like GoCharlie and ChatGPT, at first seem like magic to the end user. However, understanding how they work will help you be more effective in their use. In 1957, linguist John Rupert Firth said in a paper titled “A Synopsis of Linguistic Theory” the following:

“You shall know a word by the company it keeps.”

This single sentence summarizes the entirety of how large language models work. Every natural language processing model in artificial intelligence is built on this axiom, mainly because language itself is built on this axiom. We understand a word based on the context we use it in.

For example, if I talk about brewing some tea, I’m talking about a literal beverage made from the camellia plant. If i talk about spilling some tea, I’m no longer talking about the beverage; I’m talking about gossip. The word changes in relation to its meaning.

But it’s not just the words immediately adjacent to the word in question. It’s all the words in relation to each other. Every language that’s functional has some kind of word order, a structure that helps us understand words.

I’m brewing the tea.

There’s a clear subject, me. There’s a verb, to brew. And there’s an object, the tea.

The tea I’m brewing.

This word order changes the focus. It’s still intelligible, but conversationally, the focus is now on the tea instead of me.

Brewing I’m the tea.

Now we’re so out of order that in English this doesn’t make much sense – verb, subject, object. Yet this sentence would be perfectly appropriate in Arabic, Gaelic, and a few other languages.

The structure of a language is a matter of probabilities.

I’m brewing the { } could be tea, coffee, beer, or some other object, but if you widen the window of words around it, the context becomes more clear. If the immediate preceding sentence talks about a coffee shop, then probabilistically, beer is unlikely to be the next word.

What does this have to do with ChatGPT? The underlying model, InstructGPT (which itself is a sister to GPT-3), is built by taking massive amounts of text and converting that text into mathematical probabilities. If we look at the seminal paper “Attention is All You Need” by Ashish Vaswani et. al., this paper explains exactly how the transformer architecture – which is how ChatGPT works – operates.

First, you start with a huge amount of text.

Next, you convert every word and part of words into essentially a very large table, with the probabilities of each word appearing next to another word assigned to each cell in the table. Imagine taking a sentence and putting each word in a column in a spreadsheet. Then take the same sentence and put each word in a row in the same spreadsheet. Then calculate the number of times one word appears next to another word. Now do this over and over again for every sentence in your sample of text.

That’s the embedding part. After that, there’s a step called positional encoding. This is where word order is taken into account – the position of a word in relation to other words is given mathematical weight, so that in English, “I’m brewing the tea” has one value and “Brewing I’m the tea” has another value. Because you’ll see “I’m brewing the tea” far more times than “Brewing I’m the tea”, the former phrase and its positions will have more weight in the model, meaning that when it’s time to generate text, the probability that ChatGPT will spit out “Brewing I’m the tea” is fairly low, while “I’m brewing the tea” will be fairly high.

Here’s what makes the transformer-based models like GPT-3 and InstructGPT different and better than their predecessors: these positional encodings don’t just take into account the words immediately around a word. They take into account a LOT of text around each word – up to several thousand at a time. That’s how they know to generate “I’m brewing the tea” and not “I’m brewing the beer” in the context of whatever we’re prompting it to do.

Now that we understand the underlying technology, what do we do with this information? It should be clear that words matter in our usage of these tools. Large language models can only make probability-based guesses from what they’re provided, what we prompt them to do.

Which means that in general, the more relevant words in your prompt, the better the model will do. Let’s look at an example. I gave ChatGPT this prompt:

Complete this sentence:

“I’m brewing the

It responded:

perfect cup of coffee to start my day.”

That’s a reasonable response to a terribly vague prompt. What does it tell us about the underlying model? By its word choices, coffee is associated with brewing, in the absence of any other context, and thus it created that response.

Now, let’s add some relevant, contextual words to the prompt.

Complete this sentence:

“The tea house was packed with customers. I’m brewing the

It responded:

last pot of tea for the day, and soon we’ll have to close up shop.”

Just by giving it a relevant starter sentence with a more relevant word – tea – it narrowed the window of probability and gave more weight to tea, ending up with a sentence that matched the context I intended.

What happens if I add even more context?

Complete this sentence:

“The tea house was packed with customers. We’d been busy all day, and the golden rooibos was almost sold out. We still had plenty of yerba mate, but the oolong was starting to run low. As Karina retrieved the last box out of storage, she turned to me. I said, “I’m brewing the

It responded:

oolong now, so we can offer it while it lasts. Could you please start serving the yerba mate to the customers?” Karina nodded and quickly went to the counter, deftly pouring the hot yerba mate into gourds and adding just the right amount of sugar.

I added very specific types of tea (yes, yerba mate is a tisane, I know) and got even better results because the inclusion of tea names – a type of jargon – is highly specific and changes the probabilities of what words should come next.

The more relevant words in your prompt, the better the model will do.

Working with prompts – prompt engineering – is a lot like working with keywords in SEO. The more relevant words you use, the better the results you get. Think of it like prompt optimization instead of search engine optimization.

This is why I tell interested folks that these models are good at generation but GREAT at transformation. They rewrite like pros because they don’t have to guess what the words are, just what they should be using known probabilities.

If you want them to perform better, write out longer prompts with relevant words that help the model quickly understand the context of your ask. How long? My best performing prompts are over a page of text long. They’re highly specific, they contain a ton of detail, a fair amount of jargon when appropriate, and specific instructions which yield repeatable, reliable results.

In software development, this is requirements gathering. In creative work, this is the creative brief. In cooking, this is the recipe. You would never hand someone a two sentence recipe for baking a loaf of bread. You would never hand a creative team a two sentence brief, not if you want the result to match a vision you already have in mind.

Not coincidentally, humans work the same way, too. In general, you’ll get better results with overcommunication than insufficient communication, for both machines and humans.

Got a Question? Hit Reply

I do actually read the replies.

Share With a Friend or Colleague

If you enjoy this newsletter and want to share it with a friend/colleague, please do. Send this URL to your friend/colleague:

https://www.christopherspenn.com/newsletter

ICYMI: In Case You Missed it

Besides the newly-refreshed Google Analytics 4 course I’m relentlessly promoting (sorry not sorry), I definitely recommend the podcast episode on social media ROI.

Skill Up With Classes

These are just a few of the classes I have available over at the Trust Insights website that you can take.

Premium

Free

Get Back to Work

Folks who post jobs in the free Analytics for Marketers Slack community may have those jobs shared here, too. If you’re looking for work, check out these five most recent open positions, and check out the Slack group for the comprehensive list.

Advertisement: LinkedIn For Job Seekers & Personal Branding

It’s kind of rough out there with new headlines every day announcing tens of thousands of layoffs. To help a little, I put together a new edition of the Trust Insights Power Up Your LinkedIn course, totally for free.

👉 Click/tap here to take the free course at Trust Insights Academy

What makes this course different? Here’s the thing about LinkedIn. Unlike other social networks, LinkedIn’s engineers regularly publish very technical papers about exactly how LinkedIn works. I read the papers, put all the clues together about the different algorithms that make LinkedIn work, and then create advice based on those technical clues. So I’m a lot more confident in suggestions about what works on LinkedIn because of that firsthand information than other social networks.

If you find it valuable, please share it with anyone who might need help tuning up their LinkedIn efforts for things like job hunting.

What I’m Reading: Your Stuff

Let’s look at the most interesting content from around the web on topics you care about, some of which you might have even written.

Social Media Marketing

Media and Content

SEO, Google, and Paid Media

Advertisement: Google Analytics 4 for Marketers (UPDATED)

I heard you loud and clear. On Slack, in surveys, at events, you’ve said you want one thing more than anything else: Google Analytics 4 training. I heard you, and I’ve got you covered. The new Trust Insights Google Analytics 4 For Marketers Course is the comprehensive training solution that will get you up to speed thoroughly in Google Analytics 4.

What makes this different than other training courses?

  • You’ll learn how Google Tag Manager and Google Data Studio form the essential companion pieces to Google Analytics 4, and how to use them all together
  • You’ll learn how marketers specifically should use Google Analytics 4, including the new Explore Hub with real world applications and use cases
  • You’ll learn how to determine if a migration was done correctly, and especially what things are likely to go wrong
  • You’ll even learn how to hire (or be hired) for Google Analytics 4 talent specifically, not just general Google Analytics
  • And finally, you’ll learn how to rearrange Google Analytics 4’s menus to be a lot more sensible because that bothers everyone

With more than 5 hours of content across 17 lessons, plus templates, spreadsheets, transcripts, and certificates of completion, you’ll master Google Analytics 4 in ways no other course can teach you.

If you already signed up for this course in the past, Chapter 8 on Google Analytics 4 configuration was JUST refreshed, so be sure to sign back in and take Chapter 8 again!

👉 Click/tap here to enroll today »

Tools, Machine Learning, and AI

Analytics, Stats, and Data Science

Dealer’s Choice : Random Stuff

Advertisement: Ukraine 🇺🇦 Humanitarian Fund

If you’d like to support humanitarian efforts in Ukraine, the Ukrainian government has set up a special portal, United24, to help make contributing easy. The effort to free Ukraine from Russia’s illegal invasion needs our ongoing support.

👉 Donate today to the Ukraine Humanitarian Relief Fund »

How to Stay in Touch

Let’s make sure we’re connected in the places it suits you best. Here’s where you can find different content:

Events I’ll Be At

Here’s where I’m speaking and attending. Say hi if you’re at an event also:

  • PodCamp Philly, Philadelphia, March 2023
  • Martechopia, London, March 2023. Use MARSPEAKER20 for 20% off the ticket price.
  • B2B Ignite, Chicago, May 2023

Events marked with a physical location may become virtual if conditions and safety warrant it.

If you’re an event organizer, let me help your event shine. Visit my speaking page for more details.

Can’t be at an event? Stop by my private Slack group instead, Analytics for Marketers.

Required Disclosures

Events with links have purchased sponsorships in this newsletter and as a result, I receive direct financial compensation for promoting them.

Advertisements in this newsletter have paid to be promoted, and as a result, I receive direct financial compensation for promoting them.

My company, Trust Insights, maintains business partnerships with companies including, but not limited to, IBM, Cisco Systems, Amazon, Talkwalker, MarketingProfs, MarketMuse, Agorapulse, Hubspot, Informa, Demandbase, The Marketing AI Institute, and others. While links shared from partners are not explicit endorsements, nor do they directly financially benefit Trust Insights, a commercial relationship exists for which Trust Insights may receive indirect financial benefit, and thus I may receive indirect financial benefit from them as well.

Thank You

Thanks for subscribing and reading this far. I appreciate it. As always, thank you for your support, your attention, and your kindness.

See you next week,

Christopher S. Penn


You might also enjoy:


Want to read more like this from Christopher Penn? Get updates here:

subscribe to my newsletter here


AI for Marketers Book
Take my Generative AI for Marketers course!

Analytics for Marketers Discussion Group
Join my Analytics for Marketers Slack Group!