Almost Timely News: How to Improve Your AI Prompts (2023-04-02) :: View in Browser

Almost Timely News

👉 Watch my brand new keynote, The Marketing Singularity, all about how generative AI is the end of marketing as we know it »

Content Authenticity Statement

100% of this newsletter was written by me, the human, with no contribution from AI except in the displayed outputs.

Watch This Newsletter On YouTube 📺

Almost Timely News: How to Improve Your AI Prompts (2023-04-02)

Click here for the video 📺 version of this newsletter on YouTube »

Click here for an MP3 audio 🎧 only version »

What’s On My Mind: How to Improve Your AI Prompts

Yes, it’s another week of AI-related content. If you’ve got something you’d rather hear about instead, let me know. This week, I had the pleasure and privilege to be the opening keynote at the Martechopia conference in London, where I talked through the basics of large language models like GPT-4, PaLM, etc. and the interfaces like ChatGPT, Bing, Bard, etc. Feedback from folks was generally good, but the same question kept coming up in comments afterwards, online, and in my inbox:

How do we write better prompts?

So today, that’s what we’re going to tackle, how to write better prompts. The point of view I’m taking should be unsurprising: we’re going to rely on how the technology works to inform our protocols, our processes for writing better prompts. For the most part, I’ll be using the models released by OpenAI – InstructGPT, GPT-3.5-Turbo (the default for ChatGPT), and GPT-4.

First, let’s discuss what these models are capable of, what specific tasks they were trained to do. In the research paper for InstructGPT, which was the immediate precursor to GPT-3.5 that ChatGPT started out with last November, OpenAI specified a collection of six core types of tasks the model performed well on:

  • Generation & brainstorming
  • Knowledge seeking (open and closed QA)
  • Conversation
  • Rewriting
  • Summarization/extraction
  • Classification

What are these tasks? Based on the documentation, they break out like this:

Generation and brainstorming should be fairly obvious. Write me a blog post, write me an outline, give me some ideas for a staycation – these are content creation tasks that either result in completed content (like a first draft) or outlines of content. This category is what the majority of users do with large language models. Amusingly, this is also the category they’re least good at, but we’ll come back to that later.

The second category is knowledge seeking, through open or closed Q&A. This is using the language model like a search engine. What are the best places to visit in London on a shoestring budget, how do you poach an egg, what’s the fastest land animal, and so forth. Here, we’re not assessing a model on its generation skill so much as using it as a faster search engine or a search engine that deals with complex queries more skillfully. Closed Q&A is giving the models questions with provided answers, like a multiple choice test. This, which you’ll see in the GPT-4 technical publication, is how the models do things like pass the bar exam.

The third category is conversation, actual chat. People have real conversations with the models and just talk to them.

The fourth category is rewriting. Given a piece of text, rewrite the text in some different way. One of my favorite utilities is to take a transcript of a voice recording and have models like GPT-4 rewrite it so that it gets rid of umms, uhhs, and filler text. It’s not creating anything net new, just changing the language. This is one of the tasks these models are best at.

The fifth category is summarization and extraction. This is feeding a model a pile of text and having it condense or extract the text. Examples would be summarizing a long article or a paper into a paragraph, turning a blog post into a tweet, or extracting meeting notes and action items from a transcript. Again, this is one of the tasks that large language models excel at.

The sixth category is classification, in which we give a model a lot of text and have it perform classifying tasks on it. For example, we could give it a pile of tweets and have it assign sentiment scores to the tweets, or give it a letter written by someone and have it create a psychological profile from it.

Are there emergent tasks that don’t fall into these categories? Sure, or tasks which are a combination of one or more categories. For example, in the talk I gave, one of the tasks I had ChatGPT tackle was to read an NDA and tell me what wasn’t in it that is common in other NDAs. That falls under knowledge seeking as well as summarization, plus some reasoning that doesn’t fit neatly in either category.

Now, I mentioned a few times that some tasks are better suited for language models than others. Somewhat ironically, the task people seem to use these models for most – generation – is the task that these models tend to do least well. That’s not to say they do it badly, but it’s the most complex and difficult task with the highest likelihood of unsatisfactory results. Why? Because the underlying architecture of the models is designed for transformation – hence the name of OpenAI’s models, GPT, for generative pre-trained transformer.

Transformers, without getting bogged down into the heavy mathematics, are really good at understanding the relationship among words. Unlike older machine learning algorithms, they are very good at remembering things, which is why they can create such realistic text. They remember things like word order, and context in the sense of probability. The probability that the next word in a sentence like “I pledge allegiance to the” is nearly 100% that it’s going to be “flag”, and very, very low chance of it being “rutabaga”. When companies like OpenAI make these models, they train them on billions of pages of text to create a massive probability matrix. Thus, when we work with them, we are using these pre-trained probabilities.

So how does this relate to the six categories and writing better prompts? Consider how much guessing of probabilities the machine has to do with generation. If you say, “Write a blog post about the importance of seat belts in cars” as a prompt, it has to go dig into its table of probabilities to understand cars, what seat belts are, why they’re important, what a blog is, what a blog post is, etc. and then come up with patterns of probabilities to answer that question. That’s why, when you write a short prompt for a generation task, you tend to get lackluster outputs, outputs that are filled with bland language. The machine is having to guess a LOT of probabilities to fulfill the request.

Contrast that with a prompt like “Rewrite this text, fixing grammar, spelling, punctuation, and formatting (followed by the text)”. What does the mechanism need to do? It needs to scan in the original text, look at the probabilities of words in its model, look at the actual relationships in the inputted text, and basically just fix up the text based on its probabilities. That’s why these tools are so, so good at tasks like rewriting. They don’t have to do any creation, just editing.

Think about that in your own life. Is it easier for you to write or edit? Chances are, the majority of people find it easier to edit something they’ve written than to try conquering the blank page.

So, let’s revisit the task list. Which tasks use existing information versus which tasks are asking the machine to create something net new? Which is a writing task versus an editing task?

  • Generation & brainstorming – writing
  • Knowledge seeking (open and closed QA) – writing
  • Conversation – writing
  • Rewriting – editing
  • Summarization/extraction – editing
  • Classification – mostly editing

What does this mean when it comes to prompts? The more writing the machines have to do, the longer and more complex your prompts have to be to give it the raw materials to work with. “Write a blog post about birds” is a terribly short prompt that is going to yield terrible results. A page long prompt about the specific birds you care about along with their characteristics, data, etc. is going to yield a much more satisfying result for a generation task, for a writing task.

Again, we see this in the real world. If you hire a freelance writer, how long does your creative brief need to be to help them generate a good result? If you hire an editor, how detailed do your instructions need to be to help them generate a good result? I’d wager that the instructions you give the editor will be shorter than the instructions you give the writer.

The same is true for large language models. For an editing task, a prompt like “Fix grammar, spelling, punctuation, and formatting” along with the provided text is going to yield a very satisfactory outcome despite the shortness of the prompt because it’s an editing task.

That’s part one of understanding how to write better prompts. Let’s tackle part two – the formatting. What should the format of a prompt be? It depends on the system and the model. For OpenAI’s ChatGPT and the GPT family of models, they’re very clear about how they want developers to interface with their models:

OpenAI Playground

What we see in the developers’ version of ChatGPT is three components: system, user, and assistant. The system part of the prompt intake is what we call a role. Here, we define what role the model will be. For example, we might say, “You will act as a B2B marketer. You have expertise in B2B marketing, especially lead generation and lead nurturing. You specialize in email marketing and email newsletters as key parts of an audience retention and engagement strategy.” This role statement is essential for the model to understand what it’s supposed to be doing because the words used here help set guardrails, help refine the context of what we’re talking about.

The second part of the prompt is the user statement. This is where we give the model specific directions. “Your first task is to write a blog post about the importance of a weekly email newsletter in an overall marketing strategy.” These instructions are what the model carries out.

The third part is the assistant part, where the model returns information.

For writing tasks, having a robust system statement and an equally robust user statement is essential to getting the model to perform well. The more words, the more text we provide, the better the model is going to perform because it basically means the model has to generate fewer wild guesses. It has more to latch onto.

For editing tasks, you may not even need a system statement, because you’re providing all the text for the model to work with. It’s just processing it. Let’s look at an example. Suppose for a writing task, I only provide the user statement, either in the developer edition or in the ChatGPT edition. What are the results?

Developer edition:

Developer edition

ChatGPT consumer edition:

ChatGPT edition

In both examples, they’re pretty… well, generic. There wasn’t a ton to latch onto. Now, these aren’t BAD. They’re just… nothing special. Also, even though these use the same model, look at how much variance is in the text. Again, we didn’t give the model much to latch onto in terms of keywords, important terms that should be the focus.

Now, let’s add a detailed system statement to see how things change.

Developer edition:

Developer edition

ChatGPT consumer edition:

ChatGPT edition

See how much more specific the content is with the addition of the system statement? Both the consumer edition and the developer edition create much more similar content, and that content is more detailed, more focused because we’re giving the transformer architecture, the generative pre-trained transformer more to work with.

The art and science of writing prompts is a discipline called prompt engineering. It’s a form of software development – except instead of writing in a language like C, Java, Python, etc. we’re writing in plain, natural language. But we’re still giving directions to a machine for a repeatable output, and that means we’re programming the machine.

For your prompts to do better with these machines, adhere to the way the system is architected and designed. Adhere to the way the models work best. Understand the different classes of tasks and what you’re asking of the machine – then provide appropriate prompts for the kind of task you’re performing. Here’s the bottom line: always include a detailed system statement in writing tasks. Include them optionally in editing tasks. And don’t be afraid to be very, very detailed in either.

Why is this method of prompt engineering different than the “top 50 ChatGPT prompts” webinar being advertised on your social media feed? It’s simple: this method aligns with how the technology actually works, how it was built, and how companies like OpenAI are telling traditional software developers to talk to their models for optimum performance. When you know how something works, you can generally make it work better – and that’s why this method will work for you.

Got a Question? Hit Reply

I do actually read the replies.

Share With a Friend or Colleague

If you enjoy this newsletter and want to share it with a friend/colleague, please do. Send this URL to your friend/colleague:

https://www.christopherspenn.com/newsletter

ICYMI: In Case You Missed it

Besides the newly-refreshed Google Analytics 4 course I’m relentlessly promoting (sorry not sorry), I recommend the rant I did on why the “6 month pause on AI” open letter is a bunch of bull.

Skill Up With Classes

These are just a few of the classes I have available over at the Trust Insights website that you can take.

Premium

Free

Get Back to Work

Folks who post jobs in the free Analytics for Marketers Slack community may have those jobs shared here, too. If you’re looking for work, check out these five most recent open positions, and check out the Slack group for the comprehensive list.

Advertisement: LinkedIn For Job Seekers & Personal Branding

It’s kind of rough out there with new headlines every day announcing tens of thousands of layoffs. To help a little, I put together a new edition of the Trust Insights Power Up Your LinkedIn course, totally for free.

👉 Click/tap here to take the free course at Trust Insights Academy

What makes this course different? Here’s the thing about LinkedIn. Unlike other social networks, LinkedIn’s engineers regularly publish very technical papers about exactly how LinkedIn works. I read the papers, put all the clues together about the different algorithms that make LinkedIn work, and then create advice based on those technical clues. So I’m a lot more confident in suggestions about what works on LinkedIn because of that firsthand information than other social networks.

If you find it valuable, please share it with anyone who might need help tuning up their LinkedIn efforts for things like job hunting.

What I’m Reading: Your Stuff

Let’s look at the most interesting content from around the web on topics you care about, some of which you might have even written.

Social Media Marketing

Media and Content

SEO, Google, and Paid Media

Advertisement: Google Analytics 4 for Marketers (UPDATED)

I heard you loud and clear. On Slack, in surveys, at events, you’ve said you want one thing more than anything else: Google Analytics 4 training. I heard you, and I’ve got you covered. The new Trust Insights Google Analytics 4 For Marketers Course is the comprehensive training solution that will get you up to speed thoroughly in Google Analytics 4.

What makes this different than other training courses?

  • You’ll learn how Google Tag Manager and Google Data Studio form the essential companion pieces to Google Analytics 4, and how to use them all together
  • You’ll learn how marketers specifically should use Google Analytics 4, including the new Explore Hub with real world applications and use cases
  • You’ll learn how to determine if a migration was done correctly, and especially what things are likely to go wrong
  • You’ll even learn how to hire (or be hired) for Google Analytics 4 talent specifically, not just general Google Analytics
  • And finally, you’ll learn how to rearrange Google Analytics 4’s menus to be a lot more sensible because that bothers everyone

With more than 5 hours of content across 17 lessons, plus templates, spreadsheets, transcripts, and certificates of completion, you’ll master Google Analytics 4 in ways no other course can teach you.

If you already signed up for this course in the past, Chapter 8 on Google Analytics 4 configuration was JUST refreshed, so be sure to sign back in and take Chapter 8 again!

👉 Click/tap here to enroll today »

Tools, Machine Learning, and AI

Analytics, Stats, and Data Science

Dealer’s Choice : Random Stuff

Advertisement: Ukraine 🇺🇦 Humanitarian Fund

If you’d like to support humanitarian efforts in Ukraine, the Ukrainian government has set up a special portal, United24, to help make contributing easy. The effort to free Ukraine from Russia’s illegal invasion needs our ongoing support.

👉 Donate today to the Ukraine Humanitarian Relief Fund »

How to Stay in Touch

Let’s make sure we’re connected in the places it suits you best. Here’s where you can find different content:

Events I’ll Be At

Here’s where I’m speaking and attending. Say hi if you’re at an event also:

  • Onalytica B2B Influencer Summit, San Francisco, April 2023
  • B2B Ignite, Chicago, May 2023
  • ISBM, Chicago, September 2023
  • MarketingProfs B2B Forum, Boston, October 2023

Events marked with a physical location may become virtual if conditions and safety warrant it.

If you’re an event organizer, let me help your event shine. Visit my speaking page for more details.

Can’t be at an event? Stop by my private Slack group instead, Analytics for Marketers.

Required Disclosures

Events with links have purchased sponsorships in this newsletter and as a result, I receive direct financial compensation for promoting them.

Advertisements in this newsletter have paid to be promoted, and as a result, I receive direct financial compensation for promoting them.

My company, Trust Insights, maintains business partnerships with companies including, but not limited to, IBM, Cisco Systems, Amazon, Talkwalker, MarketingProfs, MarketMuse, Agorapulse, Hubspot, Informa, Demandbase, The Marketing AI Institute, and others. While links shared from partners are not explicit endorsements, nor do they directly financially benefit Trust Insights, a commercial relationship exists for which Trust Insights may receive indirect financial benefit, and thus I may receive indirect financial benefit from them as well.

Thank You

Thanks for subscribing and reading this far. I appreciate it. As always, thank you for your support, your attention, and your kindness.

See you next week,

Christopher S. Penn


You might also enjoy:


Want to read more like this from Christopher Penn? Get updates here:

subscribe to my newsletter here


AI for Marketers Book
Get your copy of AI For Marketers

Analytics for Marketers Discussion Group
Join my Analytics for Marketers Slack Group!