You Ask, I Answer: How to Make AI More Energy Efficient?

You Ask, I Answer: How to Make AI More Energy Efficient?

In today’s episode, Ashley raises an important question about balancing emerging technologies like AI, blockchain, and crypto with sustainability goals. I dive into the energy requirements of these technologies and discuss the concept of efficiency in AI models. Open-source communities are optimizing models for low-compute environments, making them more efficient and scalable. One technique, quantization, simplifies predictions by rounding numbers, resulting in significant energy savings without compromising accuracy. I also touch upon offsetting energy usage through renewable sources and upgrading legacy hardware. Join me for an insightful exploration of how companies can increase sustainability through efficient computing. Don’t forget to hit that subscribe button if you enjoyed this video!

Summary generated by AI.

You Ask, I Answer: How to Make AI More Energy Efficient?

Can’t see anything? Watch it on YouTube here.

Listen to the audio here:

Download the MP3 audio here.

Machine-Generated Transcript

What follows is an AI-generated transcript. The transcript may contain errors and is not a substitute for watching the video.

Christopher Penn 0:00

In today’s episode, Ashley asks, With the rise of AI blockchain, crypto and other technologies emerging in the enterprise.

One thing has become clear is that all these technologies have increasingly large energy requirements.

How can companies balance new emerging technologies with their sustainability goals? Okay? There’s a lot to unpack here.

But let’s talk about AI.

And efficiency.

Energy consumption when it comes to artificial intelligence is all about efficiency, which means not trying to make the biggest thing possible just because it’s big, because you know, there are, there are reasons to make very large models, one of which is emergent properties.

When a model gets sophisticated, sufficiently sophisticated enough, the emergent properties begin to show up things like reasoning, for example, mathematical abilities, smaller models, particularly in large language models, don’t have that.

So there is some ideas around starting off those large models and then making those models more efficient.

And that means a couple things, it means thoughtful and careful requirements gathering in the open source space, especially open source models, there is a lot of work being done now to take existing models and optimize them for efficiency for low compute environments.

This is where you’re running these models, instead of on a huge server farm with a gazillion Nvidia a 100.

GPUs, you’re running them on your laptop, maybe you’re running them on your phone, you might even be running them on those little raspberry pi devices.

That’s, that’s how small, the open source community is looking to try and make some of these models so that they fit in every possible kind of compute environment.

The more efficient they are, the more likely it is they can scale down to smaller hardware, and that also means their energy consumption goes down.

Up until the release of the llama model model makers like Google, meta, OpenAI, and stuff are focused on bigger and more powerful models.

And those models as they get more powerful, consume more energy, right? When you have this open source model.

The open source community is like how do we make this thing smaller? How do we make it run on tiny little devices.

And there are techniques, there’s techniques like low rank adapters, so Laura, which I believe is a Microsoft innovation, and a big one is quantization.

Open Source, developers can now take these models and shrink them down in terms of computing power, size, memory requirements, etc.

So that they can run on your desktop, on your laptop, etc.

And the trade offs are efficiency and accuracy, but not much.

According to according to Metis CEO Mark Zuckerberg, in an interview he did with Lex Friedman.

It’s really only a percentage point or two of efficiency as being sacrificed to make these models super efficient, so much so that, you know, part of the reason Facebook better open source their models so that they could have the rest of the world basically be their unpaid developers.

But in releasing their model, open source, the open source community is like great, we’re going to make this work for us.

And that means small hardware, cheap hardware, not the most modern hardware, and it’s working.

So let’s talk about one of those techniques, because I think it’s an important thing.

It’s important illustrate how this works.

One of the techniques is called quantization.

Now, I am going to intentionally simplify the process because what I’m going to describe here is not exactly what happens.

But it’s close enough to understand it.

Every time a large language model makes a prediction because they are just prediction engines, it comes up with a certain number of candidates.

For example, I might say, I went provide the prompt, I pledge allegiance to the and if you are an American or no American culture, you know pretty well what the last word is going to be right? When large language models work behind the scenes, all they’re doing is predicting the next word.

And this is usually a table of probabilities.

I will say like you’ll flag 99.75 to 3% table 57.14235% of cat 43.1289% and Supreme Overlord 9.1276%.

It comes up with these, these floating point numbers.

Numbers with lots of decimals for accuracy, the accuracy of the SEC and their predictions.

Quantization rounds the numbers right? So instead of it being flagged being 99.75 to 3%, it’s flag 100%.

Right, just an integer.

What happens when you do that? The amount of space functionality and computation to manage floating point numbers aka numbers of lots of decimals is much greater than what you need to manage integers, whole numbers.

So if that table becomes, you know, flag 100%, table 57% Cat 43%, Supreme Overlord 9%, there’s not a substantial loss of accuracy.

And in this case, flag is going to be the number one pick.

Now, if two numbers of two words are very, very, very close, when you round that down, you’re gonna get some inaccuracy.

But that doesn’t happen enough that the trade off isn’t worth it anymore, right, the model will still return flags the next word in sequence.

And because it’s using integers, it’s going to be a lot more energy efficient.

Now, this was a really nerdy, deep dive into the blood and guts and mechanics of this thing.

But it illustrates how open sourcing your models, open sourcing your technology, paid huge dividends to meta in getting the community to take their models and do do cool stuff with them.

And that in turn means that they found massive energy savings.

By using a more efficient model, it’s less effective, it’s less accurate, but not enough to to want to go back to using the very, very efficient, integer based predictions.

It’s a very cool innovation.

It works most of the time pretty well.

And it allows you to scale these models down really, really, really far.

There are other things, of course, companies can do to offset energy usage, one of which is if you have a facility, if you have a compute facility, and you’ve bought up a whole bunch of land, stuff as much solar and wind renewables on that property as you can, even if you don’t make enough power to net produce, you’re still going to be reducing the amount of power you consume.

And obviously, you know, one of the big things that that bogs everyone down is legacy technology, every generation of computer of chip of power source, etc.

Each new generation tends to get more energy efficient.

So if you’ve got a lot of legacy hardware laying around that was from 2009.

It’s probably consuming a lot more power than it has to and one of the things to look at is is it worth the cost to change out that hardware in exchange for energy savings? So there’s a lot of different ways that companies can increase their sustainability simply by making their compute much, much more efficient.

So really good question.

Very interesting question, and I will provide the disclaimer that I am not an engineer.

I am not an energy specialist.

I am not someone who has formal training in the stuff.

I do have solar panels on my house.

But when it comes to AI models that do know those pretty well, and these techniques, like low rank adapters and quantization can make models dramatically more efficient without sacrificing a whole lot in effectiveness.

Thanks for the question.

I’ll see you next time.

If you’d like this video, go ahead and hit that subscribe button.


You might also enjoy:


Want to read more like this from Christopher Penn? Get updates here:

subscribe to my newsletter here


AI for Marketers Book
Take my Generative AI for Marketers course!

Analytics for Marketers Discussion Group
Join my Analytics for Marketers Slack Group!



Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

Pin It on Pinterest

Shares
Share This