How AI Works 🤖🔎
An entirely non-technical explanation of how LLMs actually work
Learning about AI
I’ve been surrounded by discussion about AI lately. I’m sure you have too. The endless discussions of its implications, the ethical questions it raises, the pros and cons. Yet little of the discussion amongst my non-technical friends ever touches on how any of this stuff actually works. That’s because, from the outside, the concepts seem daunting. The idea of grasping how large language models (LLMs) function seems insurmountable.
But it’s not. Anyone can understand it. And that’s because the underlying principle driving today’s surge in AI is fairly simple.
So bear with me for fewer than two-thousand words, and I’ll try to explain—without a single technical word or mathematical equation—how LLMs actually work.
Imagine this: You’re cooking a dinner, but you need to come up with one more side dish to serve. The food you’re preparing is just shy of being enough. So we need one more component to add to the meal.
But that’s easier said than done. What we pick needs to fit in with the meal. If the meal is savory, our side dish should be too. If it already has a salad, we shouldn’t make another. If the meal is starch heavy, maybe we’d want to throw in a roasted vegetable.
Wouldn’t it be nice to have an app that just tells you what to make? And not randomly. You feed in what you’re already making, and it tells you the optimal side dish to add. This app should work for any meal, with any combination of dishes and flavors, regardless of whether it’s feeding four people or forty.
Here’s how we’re going to make this app. Two simple steps…
First, we’re going to have it understand how to think about each meal in a way a computer can understand. After all, computers don’t have taste buds. They need to be able to take a concept over which they have no intuitive understanding (food) and encode it as some kind of data capturing everything that might impact how well it fits with other food.
Second, we’re going to have it learn a way to take any set of existing dishes and spit out another. It’s not merely going to memorize what it’s seen before. Recall that this app needs to work for any combination of dishes, even ones it’s never seen paired together. So we’re not just going to program the system. We’re going to teach it.
Step One: Modeling Meals
So, step one. We need to teach the computer to think about meals as data. We’re not going to do this by telling it things about the meal (like what it tastes like or what it fits with). That’s the old type of machine learning. Too limiting; too error-prone. Instead, we’re going to just feed it a lot of data about what types of dishes people have paired together for meals in the past.
Let’s consider two types of dishes: say, a caesar salad and a caprese salad. We, as humans, know that these two dishes are similar. They’re both Italian, they’re both salads, they both contain vegetables and cheeses… But for a machine to learn how similar these two dishes are, it need not know any of the above.
It’s highly likely that as we search through our mountain of data, whenever we see a caesar salad, we’re likely to see it paired with other Italian dishes. And it’s also likely that when we see it, we’re not going to see another salad in the meal. Interestingly, the same can be said of caprese salads. They won’t typically appear with other salads, but they will appear with Italian dishes.
Because these two dishes will often co-occur with the same types of other dishes, we can categorize them as being similar. They tend to be found in the same patterns of food. You might say “a dish is characterized by the company it keeps.”
And this isn’t that intuitive. Notice that we didn’t look for any meals where caesar and caprese salads occur together. They never need to occur together for us to deem the dishes similar. They simply need to be found amongst the same other dishes for us to determine that people generally find them interchangeable and therefore quite similar.
Here’s another way to think about what we just did. Imagine we wanted to graph all food on this chart:
And to start, we took all the possible dishes we found in our data, and plotted them randomly:
Here, we’re only showing four dishes for illustrative purposes. But imagine literally every possible dish.
Now as we look through our data, each time we find two foods that co-occur with other dishes, we can move them closer together. As we see different types of sushi that tend to be coupled with the same miso soup, we’re going to inch the sushis toward each other. As we see pizza and spaghetti both appear alongside garlic bread, we’ll let them come together too:
And after doing this many times (and I mean many times), something magical occurs. Dishes that are interchangeable will cluster very closely together. Dishes that are somewhat interchangeable (say, tacos and burritos) will appear closer to each other. And dishes that are rarely if ever interchangeable (say, burgers and sushi) will be placed far apart.