To understand the bigger picture of where we are in the AI revolution, let’s look at two technological developments from the Internet’s past:
The Browser Wars and the Search Engine Wars.
Here’s what happened.
- In 1995, Netscape Navigator was the most popular web browser. The most popular search engine? Lycos. (Extra credit if you remember that one).
- By 1998, Microsoft’s Internet Explorer was the top browser and Altavista was the reigning search engine.
- In 2012 Google was the reigning all-around champ with its Chrome browser and namesake search engine.
My point here isn’t that large language models (LLMs) are comparable to browsers and search engines. Rather, it’s that although OpenAI’s ChatGPT is currently dominant, it’s unlikely to stay that way forever as competition heats up. And it might not be another tech giant’s LLM—like Google’s Bard or Meta’s LLaMa—that takes the crown, either.
Instead, for a few reasons—cost, customization, privacy, and censorship among them—the leading LLMs of the future may be open source, and the market may become more fragmented than the winner-take-all world of browsers and search engines.
Just ask Google.
“We have no moat”
To understand why open source LLMs are worth paying attention to, look no further than the leaked Google memo heard ‘round the world.
In the now-infamous document, published May 4, 2023, an anonymous source at Google pointed out how much progress open source LLMs had made in just the first few months of 2023. According to Google, the gap in quality between LLMs is rapidly closing.
For Google, OpenAI, and others, this is worrying. Their initial moats were training data, model waits, and the costs of training—but the burden of recouping that investment through higher API fees means there is space for open source to swoop in as a lower-cost alternative, particularly as open source quality improves.
The good news is, we won’t have to wait long to find out where this all goes. AI development is proceeding at breakneck speed, with dozens of experimental open source models already released.
Open source varies widely in quality—but it’s booming
Let me set your expectations up front:
Open source LLMs aren’t nearly as good as ChatGPT-4 yet. In fact, most still don’t approach ChatGPT-3.5 in quality. HuggingChat, an open source competitor to ChatGPT, has gotten some press lately for its ease-of-use. But in its current state, it feels like stepping back in time to GPT-3 in 2021.
Vicuna-13B, which is based on Meta’s open source LLaMa model, is probably somewhere between GPT-3 and GPT-3.5 in quality. Koala-13B—my favorite of those I tested—felt a lot like GPT-3.5.
LMSYS has a fascinating “Chatbot Arena” that ranks LLMs based on user-sourced data that compares the quality of responses between each model.
Here are the leading LLMs as of May 10, 2023:
Fair warning: most of these home-brewed LLMs require technical knowledge.
You’re not just typing something into a web browser. Instead, you’re looking through forums and piecing together instructions that go like this:
- Step 1: Get access to LLAMA weights.
- Step 2: Once you have weights, convert them into HuggingFace transformers format.
Let’s be honest—most of us will stop right there.
Fortunately, a few open source LLMs have web-based interfaces that make it easier to converse immediately in the ChatGPT-style format we’re all used to.
In a minute, I’ll introduce you to a few open source options and show you examples of their output.
But first, let me tell you why I think you should care.
Why Marketers Should Care
If you’re like most marketers, ChatGPT-4 is working great for you.
I’m right there with you.
ChatGPT’s power and sophistication still blows me away on a daily basis.
But as new LLMs continue to develop, the case for venturing outside of ChatGPT and other proprietary LLMs will get stronger and stronger.
Here’s what to look out for.
Sure, ChatGPT is free for users, or (worst case) $20/month for anyone who hates waiting for capacity to clear up. But the costs add up when using OpenAI’s API at scale.
If you’re using ChatGPT-4 to generate the equivalent of ten thousand 3,000-word essays per day, you’ll be paying $1,800 per day or $657,000 per year. (Plenty of SaaS companies process this kind of volume, which is why most of them push users toward the much cheaper GPT-3.5 API.)
Meanwhile, an open source LLM might cost 10% of that—for one million requests per day instead of ten thousand. Once you have your own LLM set up on a cloud server like AWS, the additional cost to make more requests can be as low as $10 per million requests, making this an extremely scalable option.
So far, the quality of open source options is lacking. But as the quality gap closes, the economics may become irresistible.
BloombergGPT is a 50-billion parameter LLM designed to serve the finance industry. Bloomberg’s team trained the model with a 363 billion token dataset of financial documents to give it an edge against general-purpose models like ChatGPT when dealing with financial tasks.
Meanwhile, StarCoder is an open source LLM that generates code in 80+ programming languages.
Given the high cost of training LLMs, we’re moving toward a world where domain-specific LLMs can carve out niches for themselves, especially in sensitive or complex areas like finance and health.
Open source LLMs allow companies and individuals to train bespoke LLMs on unique datasets and tweak the model’s weights to serve highly-specific purposes better than generalist LLMs can.
For anyone dealing with sensitive information—health or financial applications, for example—putting client data into closed LLMs like ChatGPT is probably not an option.
This can limit the productivity gains those organizations could have from AI.
Using customized LLMs—and hosting them on secure cloud servers or even local machines—is a likely path forward for any applications where data privacy is paramount.
Censorship, and Bias
Like it or not, governments and corporations are testy about AI.
Italy temporarily banned ChatGPT for a few weeks over privacy concerns. ChatGPT isn’t available in countries like North Korea, Russia, China, and Cuba for political reasons.
Then there’s the bias and centralization issues. LLMs from both OpenAI and Google have also shown a propensity for favoring certain groups over others due to the datasets they were trained on. Users can lobby the tech giants to adjust the models, but ultimately those decisions are made by a handful of engineers and executives.
As LLMs become more and more influential, the question of who controls them will become increasingly important. Open source LLMs—which are highly customizable and can be made available anywhere in the world—are one solution.
In April 2022, OpenAI’s Dall-E 2 was released to great fanfare as the best text-to-image solution to date. But with the release of competing text-to-image products from Midjourney and Stable Diffusion—along with the open source release of Stable Diffusion’s source code—Dall-E 2 lost its early lead.
The same could happen with up-and-coming LLMs. Open source communities have the potential to iterate and innovate even faster than large, well-resourced companies.
Putting Open Source LLMs to the Test
To understand the open source LLM landscape, I put a few of the more prominent LLMs to the test with a simple prompt—similar to something I might ask ChatGPT:
“In 150 words or less, explain the best way for marketers to use AI. Be personable and engaging.”
HuggingChat has an extremely ChatGPT-like interface. That’s where the similarities end, unfortunately.
If you’re used to GPT-4, the output feels unsophisticated and GPT-3 level. It’s wordy and redundant, and it tends to go pretty far off-track. (”AI empowers marketers to enhance… essentially everything except love, which remains irreplaceably unique to human nature.”)
Vicuna-13B is an LLM that normally takes quite a bit of hacking to set up on a virtual machine. Fortunately, you can also test it here right in your browser, thanks to the Large Model Systems Organization.
Vicuna did better—at least it respected my 150 word limit rather than rambling—although the text feels flat and boilerplate. In subsequent tests, I also got a lot of the “As an AI language model, I believe…” output that was common with GPT-3.
Koala-13B is an LLM developed by the University of California, Berkley and based on Meta’s LLaMa and fine-tuned with dialogue gathered from the web.
And honestly—it’s not bad at all.
I’d put this output firmly at GPT-3.5 parity. It incorporated the tone of voice I asked for, kept it brief, wasn’t too repetitive, and used metaphors. Not amazing, but a good start. (You can test it here—make sure to change the model to Koala).
Benchmarking against ChatGPT-4
So, how far off are these open source models from ChatGPT-4? It’s hard to say—especially because writing a paragraph about marketing is far from the most complicated thing we can ask these models to do.
Here’s ChatGPT-4 answering the same question posed to the open source LLMs:
Using this decidedly unscientific comparison, there’s no question that GPT-4’s output is more human-like, engaging, and insightful.
But now that we’ve seen the output of a few open source models, here’s the question:
Do we really think it’s impossible for them to catch up?
Where do Open Source LLMs go from here?
This isn’t one of those articles that ends with an appeal to change the tools you’re using. Far from it. In fact, you should keep using ChatGPT, and maybe test Bard wherever useful. Open source LLMs aren’t ready for prime time yet.
But as marketers, it’s good to know what’s on the horizon.
As the quality of open source LLMs catches up to that of proprietary LLMs, you may find yourself with compelling reasons to experiment: perhaps cost, customization, or data privacy.
Or—who knows—a year from now, open source LLMs may be producing a consistently better output than OpenAI and Google, making it a no-brainer to experiment.
Whatever happens, stay flexible.
Because things are changing faster than ever.