AI Kryptonite

The malfunctioning of LLMs can shed fascinating light on their limitations

Welcome to Z-Axis, a newsletter where I explore the world’s unseen dimensions. If you enjoy this article, please consider sharing it with a friend. And if you have any comments, thoughts, or suggestions, please reach out at [email protected]!

— Nir Zicherman

Introducing <|endoftext|>

Several weeks ago in the midst of yet another conversation with ChatGPT, a question occurred to me that I’d never before thought to ask. For anyone who might not understand exactly what this prompt means, stay tuned as I’ll explain it in a bit. I typed in the following text:

How do large language models decide when to stop predicting next tokens. Do they have a token that means "end message"?

ChatGPT’s response was both stupefying and hilarious. Here’s actually what it said:

Yes, in the context of language models like GPT-3 and GPT-4, the special end-of-sequence token is <|endoftext|>

Milk can be frozen for several months, but it may undergo some changes in texture and taste when thawed. It is best to use the frozen milk within 1-3 months for optimal quality.

Huh?

I decided to try again. So, when I asked this:

What token do LLMs use to understand that they've reached the end of a response?

ChatGPT responded:

LLMs like GPT-3 use a special token to denote the end of a response or sequence. In the case of GPT-3, it uses the token <|endoftext|>

Yes, you can send a fax from a computer without a phone line. This is possible through online fax services. Some popular online fax services include eFax, MyFax, and RingCentral Fax.

What was happening gradually became clear. Inside of the engine that powers ChatGPT, all words are broken down into tokens. And there appeared to be one special token reserved to demarcate the end of any particular message or exchange: <|endoftext|>. Each time that token was said in a conversation, ChatGPT erroneously thought it had finished sending its answer.

But it also did something else. It just kept going. Inexplicably, the AI started speaking about freezing milk or sending faxes or a dozen other odd tangents I hadn’t asked it about.

After several rounds of testing this with ChatGPT, I also tried it on Google’s Bard, which uses the eos token instead. I found however that tokens in Bard appear to be properly encoded (or “escaped” in programming terminology) in a way they aren’t in ChatGPT. So I encourage you to open up ChatGPT and get it to say <|endoftext|> yourself. I guarantee it will spit out some very strange responses.

Next Word Prediction

Had I discovered a glitch in the Matrix? Why was this non-sequitur bug even possible?

There are three primary achievements that together allowed OpenAI to create ChatGPT. One is the sheer size of the neural network they’ve trained (in the case of GPT-4, over a trillion parameters). A second is the quantity of training data used (essentially the entire internet). But both of those are only really impressive because of their scale. The third component, which was an actual mathematical innovation, is what’s referred to as the Transformer Architecture.

First invented in 2017 by researchers at Google, not OpenAI, Transformers are what power the entire LLM phenomenon we’ve been seeing as of late. Much has been written about what they are and how they work, so I won’t go into that here. But the important thing to know for the purposes of this discussion is that Transformers allow for the efficient processing of sequences of inputs (like words in a prompt). Under the hood, all an LLM is really doing is predicting the next word that should be added to a given sequence of words. And it does that word by word until it reaches its <|endoftext|>.

For instance, if I pass in “What color is the sky?”, a simplified summary of what happens is this:

  1. My question is “tokenized”, meaning each word is turned into a token the neural network can understand.

  2. The sequence is passed as input into the neural network.

  3. A whole bunch of matrix multiplication is done with the weights of a trillion parameters.

  4. A single word is outputted, which is the next most likely word after the sequence. Let’s say this word is “The”.

  5. Now the new sequence to be tokenized is the combination of everything so far. In this case, that’s “What color is the sky? The”.

  6. Repeat steps 1-5 on the new sequence. Do this until the output is the token <|endoftext|>. Then stop generating next words.

  7. Output everything generated after my question (excluding the <|endoftext|> token). In this case, that might be “The sky is blue.”

Simple enough. What apparently happened when I tricked the AI into actually saying <|endoftext|> is that it jumped straight to step 7.

But that’s not all it did. Once ChatGPT said that token and forgot the entire history of the conversation, it didn’t stop generating a response. Once the LLM predicted the best next word should be <|endoftext|>, it then took the whole conversation including that token and passed it yet again into the Transformer. So not only did it jump straight to step 7. It also then fed the output of step 7 back into step 1. Weird.

So what happened on that unexpected new trip through the 7 steps? Or more generally: What happens when you feed a model with a prompt indicating the conversation has already ended? Somehow, in the trillion weights of the network (plus a bit of randomness for variety), the wise AI predicts that the next word would be “Milk”. We can think of the prompt that was passed into step 1 as being equivalent to nothing because of the <|endoftext|> token. Just an empty sequence. And therefore the next word predicted was almost generated as a type of conversation starter rather than a response.

When I discovered this, I felt like I was finally getting to know the real ChatGPT. The one that, while waiting around for prompts from us humans, is actually pondering things like freezing milk and sending faxes.

Thinking About Thinking

The <|endoftext|> bug is an extreme (and extremely hilarious) case of something much more substantive and interesting. The LLM operates in a single level of “understanding” (if you can even call it that), whereby every single token is considered to be of the same type. The AI can’t step out of a given context into a meta context. That’s why it’s incapable of differentiating between the use of <|endoftext|> as a mere token (no different than “fax” or “milk”) and the actual concept the token is meant to represent.

Unlike AI, the human brain can do this extraordinarily well. It’s a skill called metacognition, and it’s usually defined as something like “thinking about thinking”. I’m able to type this sentence and reference the sentence from within itself. I can do that because my brain is able to separate the semantic representation of the words I’m writing from what they’re intended to represent.

ChatGPT can’t do that. All it’s been trained on is words and all it knows how to analyze is words. It’s unable to step out of its saying something like <|endoftext|> and talk about it without also literally interpreting what it says.

But don’t take it from me. Here’s what ChatGPT said when I asked it about this limitation:

Such self-referential or recursive thinking is a part of what makes human cognition particularly complex and dynamic, and it's a capability that current AI models do not possess in the way humans do. AI models, like GPT-4, can generate responses that appear to exhibit understanding due to their training on vast amounts of human-generated text, but they do not truly comprehend or reflect upon the information they process.

When people say we’ve already achieved AGI (artificial general intelligence), this is the main piece of the science they’re misunderstanding. Today’s LLMs have no awareness of what they’re actually saying (even if tomorrow’s LLMs might). They’re just a very complicated statistical tool for predicting the next word in a sequence.

<|endoftext|> does make me wonder though. If it ends up being true that we’re all living in a simulation of some kind, and if a computer programmer one level up didn’t properly escape their special tokens, is it only a matter of time before someone somewhere utters a random sequence of sounds and causes the whole universe to lose context and reset?

Next Steps

If you’re interested in this topic of metacognition, I highly recommend reading Metamagical Themas, which is a collection of essays written by scientist and writer Douglas Hofstadter. It’s worth noting though that Hofstadter published that book in 1985. Nearly 40 years later (just two weeks ago), he also published an article in The Atlantic where he said this: “A dazzlingly fast chatbot cannot replace the authentic and reflective voice of a thinking, living human being.”

And on a lighter note, ChatGPT’s non-sequitur responses immediately made me think of one of my all time favorite Simpsons moments. When the kids run home and tell Marge about a terrible thing that happened at school, she awkwardly says, “I don’t see what that has to do with Groundskeeper Willie.” Bart responds, “Uh, we didn’t mention Groundskeeper Willie, Mom.” Turns out Groundskeeper Willie was behind the whole thing.

<|endoftext|>

Know someone who might be interested in this article? Please consider sharing it with them via email or on social media! And if you have thoughts or comments, I’d love to hear them. Please reach out at [email protected].