Almost Timely News: The Importance of Open Source in AI (2023-06-18) :: View in Browser
Content Authenticity Statement
100% of this newsletter was generated by me, the human. When I do use AI-generated content, I will prominently disclose it.
Watch This Newsletter On YouTube 📺
What’s On My Mind: The Importance of Open Source in AI
Today, let’s talk about Open Source and artificial intelligence, because this is a topic that’s super important to me personally as well as to the growth of AI overall. First, let’s make clear our definitions. Open source software is software published as source code that others are free to use. What is source code? Most of the software you use on a daily basis isn’t source code. It’s a compiled product, in the same way that a loaf of bread is a compiled product of wheat, water, and yeast. The source code is the ingredients that make the finished product.
Most of the time, source code is closed, meaning that the company which wrote the code is keeping it secret. They don’t want anyone else to be able to take their ingredients and their recipe and compile them to make a competing product that’s just as good. KFC, for example, has never officially disclosed the 11 herbs and spices. Coca-Cola’s exact formula is a guarded secret. Those are examples of source code – recipes and ingredients – that are kept secret for competitive advantage.
The challenge with closed source code, or closed source for short, is that the onus of responsibility to make sure the software is bug-free is on the code maintainer, in the same way that a company’s secret recipes have to be harm-free. There are, for example, trace amounts of cocaine in Coca-Cola (though the amounts are so minuscule, you’d probably get far more usable cocaine by randomly licking dollar bills). The Coca-Cola company has to be sure their recipe doesn’t accidentally have harmful amounts of cocaine in it.
Open source code, or open source, is code published with ingredients and recipes. Everyone and anyone can inspect the code, the recipe, the ingredients to make sure it does what it says it does. This also allows other people to contribute to it, to write more into the code or edit it, as well as make a copy and customize it. But it does cause a significant problem to for-profit ventures: there’s really no way to market it. Open source by definition is free and open, which means anyone can take the code – the ingredients and recipe – and make their own product with it that is identical to the original.
That’s why you see many open source software products run by nonprofit foundations like the Apache Foundation, the Linux Foundation, and others. They earn revenue through donations and some commercial licensing, but the core product is open and available. (This is also why open source products dominate the tech landscape: it’s hard to beat free) Not all open source products are run by foundations; for example, the Android operating system made by Google is open source. And critically, and relevant to our discussion, there is a burgeoning open source community in AI.
Some products in the AI ecosystem are very closed, like the GPT 3 and 4 models created by OpenAI (which power ChatGPT) or the PaLM model that powers Google Bars. However, there are entire families of open source models that have sprung up everywhere, from OpenAI’s Whisper speech recognition software to large language models like Meta’s LLaMa and its derivatives.
Open source software isn’t for everyone; what you trade in cost is time, effort, and knowledge. It’s trivial to open up a web browser and use ChatGPT, which is why it is so popular. It is far less trivial to go to GitHub, pull a repository, compile the code on your computer (turning source code into the finished product) and then running the code yourself.
Indeed, if even reading the last sentence makes you slightly queasy, then you might be wondering why anyone would go through that effort when the easy alternatives exist.
The answer, in a word, is choice.
Open source software gives us far more choices and more flexibility than closed source software, especially in the world of AI. Let’s take voice transcription as an example. Go online, and you’ll find dozens, if not hundreds of SaaS voice transcription services. All of them promise the same thing: good quality transcription at varying prices. But almost none of them offer you substantial choices.
There is a linear relationship between processing costs – time and computing power – and accuracy. The better the results, the more computing power and the better the model has to be to achieve those results. Closed source software and SaaS offerings are like pre-packaged meals in the freezer section. They’re stupid easy to make – just unwrap it, put it in the microwave, and your food is done. But you don’t get much choice in how it’s prepared; you have to hunt for the vendor whose product requires you to compromise the least. Suppose you need food that is gluten free, low fat, and low sodium; finding a vendor that offers all three is almost impossible.
The same is true in AI. Suppose you want specific language offerings, plus accurate transcription, and a reasonable turnaround time. And suppose your needs are variable. Sometimes you want fast transcription, like the time you were about to run into a client meeting and you just need a quick summary of the most recent client call. Other times you want highly accurate transcription, like when you interviewed your favorite influencer and you want to get their words exactly right.
Closed source software vendors and SaaS vendors don’t give you those options, partly because they can be bewildering, and partly because some offerings would impose much higher costs on them. So you get what’s in the prepackaged offering, and you get what you get.
Open source software doesn’t have this problem. An open source package like Whisper runs on your computer and comes with five different models of varying size and computational requirements. You decide, when you build and run the software on your computer, what compromises you want to make – and they’re on a per job basis. You can decide how long you want a transcription to take, and choose how much accuracy to sacrifice.
If closed source software is a pre-packaged meal, open source software is a book of recipes and a pile of ingredients. You have to decide how you want to make the finished product – but in the process, you have a lot more choice over it. You could make it gluten free, low fat, AND low sodium.
Open source means more choice, and part of choice is censorship, or the lack thereof. I talked this past week about censorship in AI and how it negatively impacts models. There are plenty of tasks where you most definitely want censorship. You don’t want a chatbot that’s doing your customer service to start busting out racist, sexist, or bigoted language. That should be fairly obvious. But the tradeoff for that is AI that’s less creative. Again, with a customer service chatbot, that’s probably a more than acceptable tradeoff. For a chatbot that you want to use for brainstorming? Maybe you want the safeties off for that in a controlled environment, in a research or creative context where inappropriate responses aren’t going to cause material harm to others – and that’s another area where open source models can play a key role. You can use them without those safeguards and get more creative responses.
There’s one other aspect to many of these open source AI projects that should be on everyone’s mind: privacy. When you run an AI model locally on your computer, and it’s self-contained (meaning you’re not using a third party service like ChatGPT), what you do with that model is completely private. Your data doesn’t go anywhere. It doesn’t even leave your laptop, much less go to a third party in the cloud. I talk about this in my Intelligence Revolution presentation – if you want to use large language models with ANY kind of sensitive information, you MUST use open source models locally. They are the only models right now where you have an ironclad guarantee that sensitive information isn’t leaking out of the contained environment you operate in.
So let’s finish off today with the easiest, simplest, lowest-tech implementation of an open source model that you can get – totally free, runs on your computer if you have enough memory – and totally private. This is a piece of software called GPT4ALL, which you can download at this link. Find the version that applies for your operating system, download it, and run the installer.
Once the application is on your computer, open it up. You’ll go through a couple of basic configuration questions, and then you’ll find your way to the models management menu on the left-hand side of the screen, labeled Downloads. From here, based on your needs and requirements, you’ll choose a model to download:
After that, you’ll select the model of your choice and use it just as you would use ChatGPT and other basic AI tools. Now, you’ll note that it might be slower on your computer depending on what kind of computer you have. Again, tradeoffs; when you use a service like ChatGPT, you’re using someone else’s hardware and software, with the limitations on your choices that they impose. When you use open source models and technology, you tend to have more choices.
Open source software isn’t a drop in replacement for everything, nor should it be. But it’s an essential part of keeping the big monolithic companies honest, by ensuring that there’s more democratized access to AI technologies for everyone, not just the people who can afford it.
Got a Question? Hit Reply
I do actually read the replies.
Share With a Friend or Colleague
If you enjoy this newsletter and want to share it with a friend/colleague, please do. Send this URL to your friend/colleague:
ICYMI: In Case You Missed it
Besides the newly-refreshed Google Analytics 4 course I’m relentlessly promoting (sorry not sorry), I recommend that piece on censorship and how it impacts language models. Content warning, it does use some profanity to illustrate the point.
- Mind Readings: Large Language Model Censorship Reduces Performance
- Mind Readings: What Will Make Or Break Spatial Computing
- Mind Readings: What I’m Paying Attention To In AI
- Almost Timely News, June 11, 2023: AI, Creative Writing, and Fan Fiction
- You Ask, I Answer: Google Analytics 4 Up and Running?
- You Ask, I Answer: Jobs Lost to AI?
- You Ask, I Answer: Starting AI Tools for Content Marketers?
- Can You Spare $3,500?
- Launching a podcast – podcast marketing strategy
Skill Up With Classes
These are just a few of the classes I have available over at the Trust Insights website that you can take.
- ⭐️ The Marketing Singularity: How Generative AI Means the End of Marketing As We Knew It
- Powering Up Your LinkedIn Profile (For Job Hunters) 2023 Edition
- Measurement Strategies for Agencies
- Empower Your Marketing With Private Social Media Communities
- Exploratory Data Analysis: The Missing Ingredient for AI
- How AI is Changing Marketing, 2022 Edition
- How to Prove Social Media ROI
- Proving Social Media ROI
- Paradise by the Analytics Dashboard Light: How to Create Impactful Dashboards and Reports
Get Back to Work
Folks who post jobs in the free Analytics for Marketers Slack community may have those jobs shared here, too. If you’re looking for work, check out these recent open positions, and check out the Slack group for the comprehensive list.
- Adobe Analytics Implementation Developer at ShyftLabs
- Analytics Architect at Search Discovery
- Data Engineer at Foundation Direct LLC
- Digital Marketing Manager at Boston Scientific
- Digital Technical Analyst at Grainger
- Growth Market Account Executive at Paylocity
- Manager Data Privacy at Paylocity
- Marketing Analytics Lead at Foundation Direct LLC
- Regional Marketing Analytics Manager at Colgate
Advertisement: Bring My AI Talk To Your Company
I’ve been lecturing a lot on large language models and generative AI (think ChatGPT) lately, and inevitably, there’s far more material than time permits at a regular conference keynote. There’s a lot more value to be unlocked – and that value can be unlocked by bringing me in to speak at your company. In a customized version of my AI keynote talk, delivered either in-person or virtually, we’ll cover all the high points of the talk, but specific to your industry, and critically, offer a ton of time to answer your specific questions that you might not feel comfortable asking in a public forum.
Here’s what one participant said after a working session at one of the world’s biggest consulting firms:
“No kidding, this was the best hour of learning or knowledge-sharing I’ve had in my years at the Firm. Chris’ expertise and context-setting was super-thought provoking and perfectly delivered. I was side-slacking teammates throughout the session to share insights and ideas. Very energizing and highly practical! Thanks so much for putting it together!”
Pricing begins at US$7,500 and will vary significantly based on whether it’s in person or not, and how much time you need to get the most value from the experience.
What I’m Reading: Your Stuff
Let’s look at the most interesting content from around the web on topics you care about, some of which you might have even written.
Social Media Marketing
- Twitter Ad Buyers Want Answers on UGC and Content Moderation
- Snapchat Shares New Stats on My AI Usage, Outlines How its Utilizing Chat Data via Social Media Today
- TikTok Outlines Key Creative Considerations for Effective TikTok Content [Infographic] via Social Media Today
Media and Content
- 10 tips for PR pros and marketers for writing powerful survey press releases via Agility PR Solutions
- A 9-Step Guide to Effective Content Marketing for Law Firms
- How Can You Best Prepare for a Crisis? via Institute for Public Relations
SEO, Google, and Paid Media
- Largest Contentful Paint (LCP) Defined, Measured, & How to Fix
- Your Ultimate Guide To A Successful Technical SEO Audit
- Improving prominence reporting in the Search Console video indexing report via Google Search Central Blog via Google for Developers
Advertisement: Google Analytics 4
Believe it or not, July 1st, 2023 – and Google’s shutdown of Universal Analytics in favor of Google Analytics 4 – is in less than THIRTEEN calendar days. This means that in THIRTEEN days, you will no longer be able to capture data in Universal Analytics – it will just stop collecting data. If you haven’t already switched over, it’s urgent you do so right now. So, let’s get you moving.
Tools, Machine Learning, and AI
- Generative AI Can Contribute $4.4 Trillion Annually: McKinsey
- OpenAI Updates API, GPT-3.5, And GPT-4 For Generative AI Applications
- What is Automated Machine Learning (AutoML): How it Works and Best Practices via insideBIGDATA
Analytics, Stats, and Data Science
- Big Data Analytics: Why Is It So Crucial For Business Intelligence? via KDnuggets
- Serving Up Data: Restaurant Industry Statistics
- YouTube Analytics Basics: How To Analyze Your Data (2023) via Shopify Canada
Dealer’s Choice : Random Stuff
- 2206.04615 Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
- Beyond the Imitation Game Benchmark
- Warrior Nun Promotion, Attention & Valuation — WARRIOR NUN Global
Advertisement: Ukraine 🇺🇦 Humanitarian Fund
The war to free Ukraine continues. If you’d like to support humanitarian efforts in Ukraine, the Ukrainian government has set up a special portal, United24, to help make contributing easy. The effort to free Ukraine from Russia’s illegal invasion needs our ongoing support.
How to Stay in Touch
Let’s make sure we’re connected in the places it suits you best. Here’s where you can find different content:
- My blog – daily videos, blog posts, and podcast episodes
- My YouTube channel – daily videos, conference talks, and all things video
- My company, Trust Insights – marketing analytics help
- My podcast, Marketing over Coffee – weekly episodes of what’s worth noting in marketing
- My second podcast, In-Ear Insights – the Trust Insights weekly podcast focused on data and analytics
- On Twitter – multiple daily updates of marketing news
- On LinkedIn – daily videos and news
- On Instagram – personal photos and travels
- My free Slack discussion forum, Analytics for Marketers – open conversations about marketing and analytics
Events I’ll Be At
Here’s where I’m speaking and attending. Say hi if you’re at an event also:
- MAICON, Cleveland, July 2023
- Content Jam, Chicago, September 2023
- ISBM, Chicago, September 2023
- Content Marketing World, DC, September 2023
- Marketing Analytics and Data Science, DC, September 2023
- MarketingProfs B2B Forum, Boston, October 2023
Events marked with a physical location may become virtual if conditions and safety warrant it.
If you’re an event organizer, let me help your event shine. Visit my speaking page for more details.
Can’t be at an event? Stop by my private Slack group instead, Analytics for Marketers.
Events with links have purchased sponsorships in this newsletter and as a result, I receive direct financial compensation for promoting them.
Advertisements in this newsletter have paid to be promoted, and as a result, I receive direct financial compensation for promoting them.
My company, Trust Insights, maintains business partnerships with companies including, but not limited to, IBM, Cisco Systems, Amazon, Talkwalker, MarketingProfs, MarketMuse, Agorapulse, Hubspot, Informa, Demandbase, The Marketing AI Institute, and others. While links shared from partners are not explicit endorsements, nor do they directly financially benefit Trust Insights, a commercial relationship exists for which Trust Insights may receive indirect financial benefit, and thus I may receive indirect financial benefit from them as well.
Thanks for subscribing and reading this far. I appreciate it. As always, thank you for your support, your attention, and your kindness.
See you next week,
Christopher S. Penn
You might also enjoy:
- What Is The Difference Between Analysis and Insight?
- Almost Timely News, 17 October 2021: Content Creation Hacks, Vanity Metrics, NFTs
- How To Start Your Public Speaking Career
- Transformer les personnes, les processus et la technologie - Christopher S. Penn - Conférencier principal sur la science des données marketing
- B2B Email Marketers: Stop Blocking Personal Emails
Want to read more like this from Christopher Penn? Get updates here:
Get your copy of AI For Marketers