Mind Readings: Large Language Model Censorship Reduces Performance

Warning: this content is older than 365 days. It may be out of date and no longer relevant.

Mind Readings: Large Language Model Censorship Reduces Performance

In today’s episode, we delve into the uncomfortable reality of large language models. The safer we make them, the less creative and useful they become. By censoring these models to exclude profanity and sensitive topics, we inadvertently hinder their ability to generate contextually accurate content. Although it’s important to censor racism and hate speech, doing so affects the overall quality of the model’s output. While technological advancements and adversarial models may offer some solutions, the trade-off between creativity and professionalism remains. Join me as we explore the challenges and potential solutions in managing language models. Don’t miss out—hit that subscribe button if you found this topic intriguing.

Mind Readings: Large Language Model Censorship Reduces Performance

Can’t see anything? Watch it on YouTube here.

Listen to the audio here:

Download the MP3 audio here.

Machine-Generated Transcript

What follows is an AI-generated transcript. The transcript may contain errors and is not a substitute for watching the video.

Today let’s talk about an uncomfortable reality with large language models.

That uncomfortable reality is this.

The safer you make them, the less creative and useful they are.


Here’s why and trigger warning for profanity, there will be profanity in this video.

We use language as humans in all sorts of ways.

But things like profanity, for example, is part of our language.

The more that you have to censor a model, the more that you have to censor a piece of software to say to tell it don’t do these things, the more you constrain what it can do, and in the process, it loses context.

Here’s what I mean.

Remember that these models are nothing more than prediction engines, even the most sophisticated ones, the biggest models like GPT-4 from OpenAI, or palm two from Google with like, what 570 billion parameters.

They’re just prediction engines.

If I say I pledge allegiance to the, the prediction engines gonna come up with a list of anywhere between five and 40 different alternatives for the next word is going to score them by probability, and almost certainly the probability is gonna be near 100%.

For the word flag, because I pledge allegiance to the flag is a very common sentence.

When these models are trained on data that has been scraped up from around the web, among other sources, it builds statistical relationships from one word to the next.

So for example, if I say I’m brewing the, depending on the context, the words that it was trained on, and the words that I’ve mentioned my prompt, it’ll choose a word like coffee, or tea or beer or the fall of capitalism.

And in doing so, it’s relying on the patterns in language that it learned on input.

If you look at a lot of the training libraries, explicitly declared or implicitly declared from big companies, for example, in in an interview with Lex Friedman, Mark Zuckerberg had said that the Facebook llama model was trained on data from Facebook’s many services, right, facebook, whatsapp, Instagram, etc.

What’s embedded in most people’s language? Yeah, profanity, racism, bias, you name it.

Particularly if you’re drawing from Facebook, I mean, there’s a whole bunch of people there who think that the world is still flat, which is mind blowing.

And because it’s ingesting those patterns and language, if you then have to go back and say, don’t say X, or Y, or Z, you’re essentially handicapping the model, you are handicapping the model.

And it’s not just going to stop using words you don’t want it to use, but it’s also going to have to adapt and figure out how to use words in less creative ways that don’t evoke those topics.

So if you say, understandably, incorrectly, hey, don’t be racist.

And there’s a good chunk of racist text that was fed into the model.

Suppressing that not only suppresses racist language as you would normally want to, but it also impacts all the other words that are used in that context.

And it impacts their overall probabilities.

If I say, Go fuck yourself, all right.

And then we say, You know what, let’s not use the word voc.

There is a statistical relationship.

In that sentence between the word go, the word fuck, and the word yourself.

And if you see that a lot, and it’s a very common phrase, right? GF why these words are associated with each other.

Now, if I suppress it, or try to remove or censor that the word voc and you’re left with Go yourself, which makes no sense, right? And it breaks the probabilities around those words.

So those words go and yourself are actually going to be negatively impacted by the suppression of the profanity.

To get it, how these words are related to each other.

And the more you censor words, the more you have to come up with alternatives that may not be as good.

Now, clearly, and let’s be very, very clear.

It’s a good idea if you’re going to be using any of these models, particularly in any kind of professional context, to censor things like racism, and bigotry and hate speech and substantial profanity.

But you also have to understand that it will cause computational problems in these models.

How do you get around this? The short answer is if it wasn’t in the training data to begin with, it wouldn’t be a problem on the output side, but we don’t have control over how these models are trained.

And there are very few companies that can actually build these things that have enough data to do the training like Google or Facebook or OpenAI.

And so we have to essentially handicap the models on their outputs.

Now, I believe there are probably some technological solutions to do this better that the industry isn’t talking about enough yet, I believe there are some interesting things being done with adversarial models, which basically say, you know, here’s what I’m looking for you to not do, and sort of getting into arguments, semantic and metaphorically, with, with the language model, to help it to not do those things more.

But if you want maximum creativity, you would have to use a model that has also is has problematic concepts and textin.

It Right.

If you want to maximize what a model can do, you will probably have to accept that you’ll use a model that has a higher potential to say things you don’t want to say, right? So you’ll want have to build some gatekeeping in on on its outputs to say to to inspect outputs, and so that the model can be as creative as it wants to be, and then can get smacked down later on in the pipeline.

So yeah, let’s we’re not going to generate the sentence back for generation over time, I suspect companies.

And if I had to guess, company, my guess would be IBM, because they’re not known for being first to market.

But they’re typically known for being best to market, particularly on the Enterprise stuff.

I would expect companies like IBM to say, hey, we’re going to build a custom model that doesn’t include profanity, that doesn’t include racism, and bigotry and homophobia, we’re going to exclude those things from the source training data to begin with, so that it’s not there on the output, and the output side can’t be there on the output side, because it didn’t exist on the input side.

And that’s what we’ll have to do to if we want models that are have not had their, their creativity handicapped, but also have not taken problematic texts and concepts with them.

So the bottom line is if you want creativity, you also have to accept a model that has problematic text.

If you want a model to behave professionally, you’re going to have to handicap it significantly and the outputs may be lower quality as a result, that’s the current trade off as the time they’ll mid year 2023 That I’m recording this.

It’s entirely possible tomorrow, this could all changed by the way so it’s worth keeping your ear to the ground to see what other things are likely going to happen to help these models become smarter, and more professional.

Thanks for tuning in.

We’ll talk to you next time.

If you’d like this video, go ahead and hit that subscribe button.

You might also enjoy:

Want to read more like this from Christopher Penn? Get updates here:

subscribe to my newsletter here

AI for Marketers Book
Take my Generative AI for Marketers course!

Analytics for Marketers Discussion Group
Join my Analytics for Marketers Slack Group!


Leave a Reply

Your email address will not be published. Required fields are marked *

Pin It on Pinterest

Share This