# Almost Timely News, October 1, 2023: How Large Language Models Work

Almost Timely News: How Large Language Models Work (2023-10-01) :: View in Browser

## Content Authenticity Statement

95% of this newsletter was generated by me, the human. There are two outputs from ChatGPT featured in the main article. Learn why this kind of disclosure is important.

Almost Timely News: How Large Language Models Work (2023-10-01)

## What’s On My Mind: How Large Language Models Work – A New Explanation

I’ve been working on a more thorough way of explaining how large language models do what they do. Previously, I had explained how large amounts of text were digested down into statistical representations, and while this is accurate and true, it’s hard for people to visualize. So let’s tackle this in a new way, with word clouds. Now, to be clear, this is a vast oversimplification of the mathematics behind language models. If you enjoy calculus and linear algebra and want to dig into the actual mechanics and mathematics of large language models, I recommend reading the academic paper that started it all, “Attention is All You Need” by Vaswani et al.

Take any word, and there are words associated with it. For example, if I give you the word marketing, what other words related to it come to mind? Digital marketing, content marketing, email marketing, marketing strategy, marketing plans, marketing template, sales and marketing – the list goes on and on, but there are plenty of words that are associated with the word marketing. Imagine that word, marketing, and the words associated with it as a word cloud. The words that occur the most around marketing are bigger in the cloud. Got it?

Next, let’s take a different word, a word like B2B. When you think of words associated with B2B, what words come to mind? B2B marketing, sure. B2B sales, B2B commerce, B2B strategy, and so on and so forth. Again, picture that word and all its associated words as a word cloud and again, the words that occur the most around B2B are bigger in the word cloud.

Now, imagine those two clouds next to each other. What words do they have in common? How much do they overlap and intersect? B2B and marketing share common words in each other’s clouds like sales, commerce, strategy, etc. Those words have an increased probability when you mash the two clouds together, so you could imagine those words would get even bigger.

And that’s the start of how large language models do what they do. Large language models essentially are composed of massive numbers of word clouds for every word they’ve seen, and the words associated with those words. Unlike the toy example we just showed, the way these models are made, each individual word’s cloud is composed of tens or hundreds of thousands of additional words. In the largest models, like GPT-4, there might even be millions of associations for any given word, and those associations also occur among words, phrases, sentences, and even entire documents.

For example, there will be multiple associations for a word – apple could refer to a fruit or a computer company, and the words around apple determine which association will be used. Each of these clusters of association exist inside a large language model as well, which is how it knows to mention Steve Jobs if your prompt contains both apple and computer along with other related words, even if you don’t mention Steve Jobs by name.

When you use a tool like LM Studio or ChatGPT or Google Bard, and you give it a prompt, it goes into its library of word clouds and takes each word from your prompt, extracts the relevant word cloud associations, mashes them all together, and the intersections of all those words are essentially what it spits out as its answer, formatted in the language of your choice. This is why these tools are so effective and so powerful – they have a knowledge of language based on how a word relates to every other word that’s nearby it in millions of pages of text.

This is also what makes the difference between good prompts and bad prompts, between non-creative and creative responses. Think about it for a second. If you write a short, boring prompt, it’s going to create a mash of word clouds that is relatively small, and that means only the most frequent (and therefore boring and non-creative) words will be returned. “Write a blog post about the benefits of email marketing” is going to generate some really mediocre, boring content because it’s a mediocre, boring prompt that will return high-level word cloud mashups only. True, there will still be hundreds of thousands of words in the combined cloud of a prompt that small, but because we’re thinking about the INTERSECTIONS of those clouds, where they overlap, you’re not going to get much variety or creativity:

If you used a prompt like “You are a MarketingProfs B2B Forum award-winning blogger who writes about B2B marketing and email marketing for the industrial concrete industry. Your first task is to draft a blog post about the benefits of a high-frequency email marketing program for an industrial concrete company that sells to state and local governments; focus on unique aspects of marketing the concrete industry and heavy construction. You know CASL, CAN-SPAM, and GDPR. You know email marketing best practices, especially for nurture campaigns in marketing automation systems. Write in a warm, professional tone of voice. Avoid tropes, jargon, and business language. Avoid adverbs.” How many of these word clouds will be created with a prompt this large? Many, many word clouds, and each cloud of associations will have overlaps with the others. The net result is you’ll get a much more tailored, unique, and creative result.

When you understand conceptually what’s going on under the hood of large language models, it becomes easier to understand how to use them to the best of their capabilities – and why non-language tasks simply fail most of the time. For example, math is really hard for many models to get right because they fundamentally don’t do computation. They’re predicting the likelihood of characters – numbers – and the numbers that should be nearby. That’s why earlier models had no trouble with expressions like 2 + 2 = 4 but could not do 22 + 7 = 29. The former equation occurs much more frequently in written text, while the latter is fairly rare by comparison. The model isn’t performing any calculations, and thus tends to get the answer wrong.

This is also why censorship is so damaging to the structure of these models. Take any common profane word, like the venerable F word. How often do we use it? How many other words are associated with it? If you were to try ripping it out of a combination of word clouds, how many other words might get ripped out too – and are they useful words otherwise?

That’s also why models behave less or more creatively. They’re not intrinsically creative; they’re simply clouds of probabilities being mashed together. When you give an non-creative prompt, you invoke only the most broad probabilities, and you get a non-creative result. When you give a highly creative, relatively rare prompt that has many combinations of many specific words, you invoke very specific probabilities and get more creative results.

Large language models are libraries of probability, and every time we use them, we are invoking probabilities based on the words in our prompts. If we aren’t getting the results we want, we should examine the words, phrases, and sentences in our prompts and adjust them to add more detail until we get what we want. There’s no magic formula or secret guide to prompt engineering, no “Instant Success with ChatGPT” that has any serious credibility. If you have conversations with these models that use the appropriate language to get all the word clouds to overlap well, you’ll get what you want from a large language model.

Rate this week’s newsletter issue with a single click. Your feedback over time helps me figure out what content to create for you.

## Share With a Friend or Colleague

If you enjoy this newsletter and want to share it with a friend/colleague, please do. Send this URL to your friend/colleague:

## ICYMI: In Case You Missed it

Besides the newly-refreshed Google Analytics 4 course I’m relentlessly promoting (sorry not sorry), I recommend the episode I did with Katie on how to vet an analytics agency.

## Skill Up With Classes

These are just a few of the classes I have available over at the Trust Insights website that you can take.

### Free

I’ve been lecturing a lot on large language models and generative AI (think ChatGPT) lately, and inevitably, there’s far more material than time permits at a regular conference keynote. There’s a lot more value to be unlocked – and that value can be unlocked by bringing me in to speak at your company. In a customized version of my AI keynote talk, delivered either in-person or virtually, we’ll cover all the high points of the talk, but specific to your industry, and critically, offer a ton of time to answer your specific questions that you might not feel comfortable asking in a public forum.

Here’s what one participant said after a working session at one of the world’s biggest consulting firms:

“No kidding, this was the best hour of learning or knowledge-sharing I’ve had in my years at the Firm. Chris’ expertise and context-setting was super-thought provoking and perfectly delivered. I was side-slacking teammates throughout the session to share insights and ideas. Very energizing and highly practical! Thanks so much for putting it together!”

Pricing begins at US\$7,500 and will vary significantly based on whether it’s in person or not, and how much time you need to get the most value from the experience.

## Get Back to Work

Folks who post jobs in the free Analytics for Marketers Slack community may have those jobs shared here, too. If you’re looking for work, check out these recent open positions, and check out the Slack group for the comprehensive list.

Let’s look at the most interesting content from around the web on topics you care about, some of which you might have even written.

### SEO, Google, and Paid Media

If you’re familiar with the Cameo system – where people hire well-known folks for short video clips – then you’ll totally get Thinkers One. Created by my friend Mitch Joel, Thinkers One lets you connect with the biggest thinkers for short videos on topics you care about. I’ve got a whole slew of Thinkers One Cameo-style topics for video clips you can use at internal company meetings, events, or even just for yourself. Want me to tell your boss that you need to be paying attention to generative AI right now?

📺 Pop on by my Thinkers One page today and grab a video now.

## How to Stay in Touch

Let’s make sure we’re connected in the places it suits you best. Here’s where you can find different content:

The war to free Ukraine continues. If you’d like to support humanitarian efforts in Ukraine, the Ukrainian government has set up a special portal, United24, to help make contributing easy. The effort to free Ukraine from Russia’s illegal invasion needs our ongoing support.

## Events I’ll Be At

Here’s where I’m speaking and attending. Say hi if you’re at an event also:

• MarketingProfs B2B Forum, Boston, October 2023
• Content Jam, Chicago, October 2023
• SMPS AEC AI, DC, October 2023
• Humanize Your Brand, Online, October 2023
• AI and the End of SEO with SiteImprove, Online, October 2023
• DigitalNow, Denver, November 2023
• AImpact, Online, November 2023
• Social Media Marketing World, San Diego, February 2024

Events marked with a physical location may become virtual if conditions and safety warrant it.

If you’re an event organizer, let me help your event shine. Visit my speaking page for more details.

Can’t be at an event? Stop by my private Slack group instead, Analytics for Marketers.

## Required Disclosures

My company, Trust Insights, maintains business partnerships with companies including, but not limited to, IBM, Cisco Systems, Amazon, Talkwalker, MarketingProfs, MarketMuse, Agorapulse, Hubspot, Informa, Demandbase, The Marketing AI Institute, and others. While links shared from partners are not explicit endorsements, nor do they directly financially benefit Trust Insights, a commercial relationship exists for which Trust Insights may receive indirect financial benefit, and thus I may receive indirect financial benefit from them as well.

## Thank You

See you next week,

Christopher S. Penn

You might also enjoy:

Want to read more like this from Christopher Penn? Get updates here: