Almost Timely News: The Importance of Open Source in AI

Almost Timely News: The Importance of Open Source in AI (2023-06-18) :: View in Browser

Almost Timely News

👉 Watch my brand new keynote, The Intelligence Revolution, all about how generative AI is the end of marketing as we know it »

Content Authenticity Statement

100% of this newsletter was generated by me, the human. When I do use AI-generated content, I will prominently disclose it.

Watch This Newsletter On YouTube 📺

Almost Timely News: The Importance of Open Source in AI (2023-06-18)

Click here for the video 📺 version of this newsletter on YouTube »

Click here for an MP3 audio 🎧 only version »

What’s On My Mind: The Importance of Open Source in AI

Today, let’s talk about Open Source and artificial intelligence, because this is a topic that’s super important to me personally as well as to the growth of AI overall. First, let’s make clear our definitions. Open source software is software published as source code that others are free to use. What is source code? Most of the software you use on a daily basis isn’t source code. It’s a compiled product, in the same way that a loaf of bread is a compiled product of wheat, water, and yeast. The source code is the ingredients that make the finished product.

Source code by itself usually isn’t usable, not for the most part (there are exceptions like scripting languages such as JavaScript and Python, but that’s another topic). When the source code is compiled, that’s when we get a usable product like Microsoft Word or Angry Birds.

Most of the time, source code is closed, meaning that the company which wrote the code is keeping it secret. They don’t want anyone else to be able to take their ingredients and their recipe and compile them to make a competing product that’s just as good. KFC, for example, has never officially disclosed the 11 herbs and spices. Coca-Cola’s exact formula is a guarded secret. Those are examples of source code – recipes and ingredients – that are kept secret for competitive advantage.

The challenge with closed source code, or closed source for short, is that the onus of responsibility to make sure the software is bug-free is on the code maintainer, in the same way that a company’s secret recipes have to be harm-free. There are, for example, trace amounts of cocaine in Coca-Cola (though the amounts are so minuscule, you’d probably get far more usable cocaine by randomly licking dollar bills). The Coca-Cola company has to be sure their recipe doesn’t accidentally have harmful amounts of cocaine in it.

Open source code, or open source, is code published with ingredients and recipes. Everyone and anyone can inspect the code, the recipe, the ingredients to make sure it does what it says it does. This also allows other people to contribute to it, to write more into the code or edit it, as well as make a copy and customize it. But it does cause a significant problem to for-profit ventures: there’s really no way to market it. Open source by definition is free and open, which means anyone can take the code – the ingredients and recipe – and make their own product with it that is identical to the original.

That’s why you see many open source software products run by nonprofit foundations like the Apache Foundation, the Linux Foundation, and others. They earn revenue through donations and some commercial licensing, but the core product is open and available. (This is also why open source products dominate the tech landscape: it’s hard to beat free) Not all open source products are run by foundations; for example, the Android operating system made by Google is open source. And critically, and relevant to our discussion, there is a burgeoning open source community in AI.

Some products in the AI ecosystem are very closed, like the GPT 3 and 4 models created by OpenAI (which power ChatGPT) or the PaLM model that powers Google Bars. However, there are entire families of open source models that have sprung up everywhere, from OpenAI’s Whisper speech recognition software to large language models like Meta’s LLaMa and its derivatives.

Open source software isn’t for everyone; what you trade in cost is time, effort, and knowledge. It’s trivial to open up a web browser and use ChatGPT, which is why it is so popular. It is far less trivial to go to GitHub, pull a repository, compile the code on your computer (turning source code into the finished product) and then running the code yourself.

Indeed, if even reading the last sentence makes you slightly queasy, then you might be wondering why anyone would go through that effort when the easy alternatives exist.

The answer, in a word, is choice.

Open source software gives us far more choices and more flexibility than closed source software, especially in the world of AI. Let’s take voice transcription as an example. Go online, and you’ll find dozens, if not hundreds of SaaS voice transcription services. All of them promise the same thing: good quality transcription at varying prices. But almost none of them offer you substantial choices.

There is a linear relationship between processing costs – time and computing power – and accuracy. The better the results, the more computing power and the better the model has to be to achieve those results. Closed source software and SaaS offerings are like pre-packaged meals in the freezer section. They’re stupid easy to make – just unwrap it, put it in the microwave, and your food is done. But you don’t get much choice in how it’s prepared; you have to hunt for the vendor whose product requires you to compromise the least. Suppose you need food that is gluten free, low fat, and low sodium; finding a vendor that offers all three is almost impossible.

The same is true in AI. Suppose you want specific language offerings, plus accurate transcription, and a reasonable turnaround time. And suppose your needs are variable. Sometimes you want fast transcription, like the time you were about to run into a client meeting and you just need a quick summary of the most recent client call. Other times you want highly accurate transcription, like when you interviewed your favorite influencer and you want to get their words exactly right.

Closed source software vendors and SaaS vendors don’t give you those options, partly because they can be bewildering, and partly because some offerings would impose much higher costs on them. So you get what’s in the prepackaged offering, and you get what you get.

Open source software doesn’t have this problem. An open source package like Whisper runs on your computer and comes with five different models of varying size and computational requirements. You decide, when you build and run the software on your computer, what compromises you want to make – and they’re on a per job basis. You can decide how long you want a transcription to take, and choose how much accuracy to sacrifice.

If closed source software is a pre-packaged meal, open source software is a book of recipes and a pile of ingredients. You have to decide how you want to make the finished product – but in the process, you have a lot more choice over it. You could make it gluten free, low fat, AND low sodium.

Open source means more choice, and part of choice is censorship, or the lack thereof. I talked this past week about censorship in AI and how it negatively impacts models. There are plenty of tasks where you most definitely want censorship. You don’t want a chatbot that’s doing your customer service to start busting out racist, sexist, or bigoted language. That should be fairly obvious. But the tradeoff for that is AI that’s less creative. Again, with a customer service chatbot, that’s probably a more than acceptable tradeoff. For a chatbot that you want to use for brainstorming? Maybe you want the safeties off for that in a controlled environment, in a research or creative context where inappropriate responses aren’t going to cause material harm to others – and that’s another area where open source models can play a key role. You can use them without those safeguards and get more creative responses.

There’s one other aspect to many of these open source AI projects that should be on everyone’s mind: privacy. When you run an AI model locally on your computer, and it’s self-contained (meaning you’re not using a third party service like ChatGPT), what you do with that model is completely private. Your data doesn’t go anywhere. It doesn’t even leave your laptop, much less go to a third party in the cloud. I talk about this in my Intelligence Revolution presentation – if you want to use large language models with ANY kind of sensitive information, you MUST use open source models locally. They are the only models right now where you have an ironclad guarantee that sensitive information isn’t leaking out of the contained environment you operate in.

So let’s finish off today with the easiest, simplest, lowest-tech implementation of an open source model that you can get – totally free, runs on your computer if you have enough memory – and totally private. This is a piece of software called GPT4ALL, which you can download at this link. Find the version that applies for your operating system, download it, and run the installer.

Once the application is on your computer, open it up. You’ll go through a couple of basic configuration questions, and then you’ll find your way to the models management menu on the left-hand side of the screen, labeled Downloads. From here, based on your needs and requirements, you’ll choose a model to download:

GPT4ALL

After that, you’ll select the model of your choice and use it just as you would use ChatGPT and other basic AI tools. Now, you’ll note that it might be slower on your computer depending on what kind of computer you have. Again, tradeoffs; when you use a service like ChatGPT, you’re using someone else’s hardware and software, with the limitations on your choices that they impose. When you use open source models and technology, you tend to have more choices.

Open source software isn’t a drop in replacement for everything, nor should it be. But it’s an essential part of keeping the big monolithic companies honest, by ensuring that there’s more democratized access to AI technologies for everyone, not just the people who can afford it.

Got a Question? Hit Reply

I do actually read the replies.

Share With a Friend or Colleague

If you enjoy this newsletter and want to share it with a friend/colleague, please do. Send this URL to your friend/colleague:

https://www.christopherspenn.com/newsletter

ICYMI: In Case You Missed it

Besides the newly-refreshed Google Analytics 4 course I’m relentlessly promoting (sorry not sorry), I recommend that piece on censorship and how it impacts language models. Content warning, it does use some profanity to illustrate the point.

Skill Up With Classes

These are just a few of the classes I have available over at the Trust Insights website that you can take.

Premium

Free

Get Back to Work

Folks who post jobs in the free Analytics for Marketers Slack community may have those jobs shared here, too. If you’re looking for work, check out these recent open positions, and check out the Slack group for the comprehensive list.

Advertisement: Bring My AI Talk To Your Company

📺 Click here to watch this ad in video format on YouTube

I’ve been lecturing a lot on large language models and generative AI (think ChatGPT) lately, and inevitably, there’s far more material than time permits at a regular conference keynote. There’s a lot more value to be unlocked – and that value can be unlocked by bringing me in to speak at your company. In a customized version of my AI keynote talk, delivered either in-person or virtually, we’ll cover all the high points of the talk, but specific to your industry, and critically, offer a ton of time to answer your specific questions that you might not feel comfortable asking in a public forum.

Here’s what one participant said after a working session at one of the world’s biggest consulting firms:

“No kidding, this was the best hour of learning or knowledge-sharing I’ve had in my years at the Firm. Chris’ expertise and context-setting was super-thought provoking and perfectly delivered. I was side-slacking teammates throughout the session to share insights and ideas. Very energizing and highly practical! Thanks so much for putting it together!”

Pricing begins at US$7,500 and will vary significantly based on whether it’s in person or not, and how much time you need to get the most value from the experience.

👉 To book a session, click here! 👈

What I’m Reading: Your Stuff

Let’s look at the most interesting content from around the web on topics you care about, some of which you might have even written.

Social Media Marketing

Media and Content

SEO, Google, and Paid Media

Advertisement: Google Analytics 4

Believe it or not, July 1st, 2023 – and Google’s shutdown of Universal Analytics in favor of Google Analytics 4 – is in less than THIRTEEN calendar days. This means that in THIRTEEN days, you will no longer be able to capture data in Universal Analytics – it will just stop collecting data. If you haven’t already switched over, it’s urgent you do so right now. So, let’s get you moving.

👉 Trust Insights can do it for you. Reach out to us if you want support setting up your Google Analytics 4 instance.

👉 You can do it yourself. Take our course, Google Analytics 4 for Marketers, to learn the ins and outs of the new system.

Tools, Machine Learning, and AI

Analytics, Stats, and Data Science

Dealer’s Choice : Random Stuff

Advertisement: Ukraine 🇺🇦 Humanitarian Fund

The war to free Ukraine continues. If you’d like to support humanitarian efforts in Ukraine, the Ukrainian government has set up a special portal, United24, to help make contributing easy. The effort to free Ukraine from Russia’s illegal invasion needs our ongoing support.

👉 Donate today to the Ukraine Humanitarian Relief Fund »

How to Stay in Touch

Let’s make sure we’re connected in the places it suits you best. Here’s where you can find different content:

Events I’ll Be At

Here’s where I’m speaking and attending. Say hi if you’re at an event also:

  • MAICON, Cleveland, July 2023
  • Content Jam, Chicago, September 2023
  • ISBM, Chicago, September 2023
  • Content Marketing World, DC, September 2023
  • Marketing Analytics and Data Science, DC, September 2023
  • MarketingProfs B2B Forum, Boston, October 2023

Events marked with a physical location may become virtual if conditions and safety warrant it.

If you’re an event organizer, let me help your event shine. Visit my speaking page for more details.

Can’t be at an event? Stop by my private Slack group instead, Analytics for Marketers.

Required Disclosures

Events with links have purchased sponsorships in this newsletter and as a result, I receive direct financial compensation for promoting them.

Advertisements in this newsletter have paid to be promoted, and as a result, I receive direct financial compensation for promoting them.

My company, Trust Insights, maintains business partnerships with companies including, but not limited to, IBM, Cisco Systems, Amazon, Talkwalker, MarketingProfs, MarketMuse, Agorapulse, Hubspot, Informa, Demandbase, The Marketing AI Institute, and others. While links shared from partners are not explicit endorsements, nor do they directly financially benefit Trust Insights, a commercial relationship exists for which Trust Insights may receive indirect financial benefit, and thus I may receive indirect financial benefit from them as well.

Thank You

Thanks for subscribing and reading this far. I appreciate it. As always, thank you for your support, your attention, and your kindness.

See you next week,

Christopher S. Penn


You might also enjoy:


Want to read more like this from Christopher Penn? Get updates here:

subscribe to my newsletter here


AI for Marketers Book
Take my Generative AI for Marketers course!

Analytics for Marketers Discussion Group
Join my Analytics for Marketers Slack Group!



Comments

One response to “Almost Timely News: The Importance of Open Source in AI”

Leave a Reply

Your email address will not be published. Required fields are marked *

Pin It on Pinterest

Shares
Share This