You're at a Local Minimum

Who knew human psychology was just calculus in disguise?

Nir Zicherman
November 20, 2023

“It’s Pretty Good”

Countless times, with countless individuals, when I’ve asked the other person how they’re doing, the conversation went something like this…

“Are you happy where you are?”

“Well, I guess, sort of, maybe.”

“Why don’t you leave and try something new?”

“Because it’s pretty good. And who knows if somewhere else will be worse.”

You’ve also been in these conversations, likely on both sides. It’s the common human tendency to accept the status quo. Moreover, it’s the aversion we people have to change.

There’s a way to think about this fear of change scientifically. It requires reframing that conversation above as, of all things, an artificial intelligence (AI) problem.

Those people you’re having that conversation with, who are hesitant to try something new? They’re at a local minimum.

Error Minimization

Most modern machine learning processes (inclusive of all of the AI innovations that have been dominating headlines recently) are predicated on the concept of error minimization.

Imagine you were training a machine to learn something. Under the hood, it has knobs (actually weighted parameters), trillions of them, that can be turned up or down. The machine tries to perform an action (say, playing chess, walking across a room, conversing in natural language, etc.). Inevitably, some of what it does it is wrong. This is where a calculus-inspired approach is used to slightly adjust all the knobs to make the output less erroneous.

If our model at any given time is represented by this red ball, our objective is to drive that ball in the direction of the graph that reduces the error rate:

So the machine tries again. Once again, it gets it wrong, but just a bit less wrong. The knobs are adjusted. And on and on, billions of time, until all the knobs are fine-tuned just right. The reason we know it’s time to stop is quite simple: Any further adjustment of any knob will actually increase the error. We’ve reached the minimum possible error. The machine is perfectly trained. The model has reached a local minimum.

Stress Minimization

Now let’s do something interesting. Let’s replace the vertical axis, which represents the error rate, with the label “stress”. After all, don’t we humans do the same thing ML systems do? If we’re at a stressful peak and there’s a minimum in sight (i.e. a way to reduce our stress), we race toward it:

And once we’re there, we’re happy! We’ve made it! We’ve found a stable job! We’ve found ourselves in a comfortable place, with comfortable relationships, in comfortable situations! We look left and right, but in every direction whereby we can change the status quo, stress levels only rise. And therefore—being the creatures of comfort that we are—we settle happily into our own local minimum:

It’s no wonder people prefer to reach these stable points and hesitate to change anything. Any which way we change, stress will rise.

Except there’s a reason they’re called local minima (both in machine learning and in life). We aren’t looking at the big picture. If we zoom out, we realize there are many such local minima. Moreover, only one of them is truly the global minimum. And we’re not there.

Fear of change is a function of vantage points. On a small scale, when surrounded in all directions by humps of stress, it’s quite easy to settle into moderate successes (be them jobs, relationships, habits, etc.).

Yet it’s when we look at life on a large scale that we often realize that better valleys exist on the other side those exacting hurdles.

Double Descent

And this brings me back to machine learning. Back in the day, researchers believed that once an error rate began rising during optimization, the model has reached the point of “overfitting” (meaning it was essentially learning to just recreate its test data rather than truly learn anything). Put differently, the ideal end-state to machine learning training used to be reaching a local minimum.

But that is no longer the case. It was only several years ago that researchers realized that continually pushing up a model’s parameters—even if that temporarily caused it to rise in error rates—would eventually cause the error rates to fall to an even lower minimum that the original.

This phenomenon, known as Double Descent, is the life lesson cited above reframed in mathematical terms.

There are, as we know, many peaks in the topology of life stressors. And I believe that what we often overlook is that there are just as many valleys. What it’s taken computer scientists a few decades to figure out, and what it’s taken mankind a few hundred millennia to figure out, is that pushing past the hills of stress can often yield more positive outcomes in the long term.

It’s like that classic question I recall being asked as a child: Are there more inclines or declines in the world? You can crunch all the numbers in existence only to realize: It’s all a matter of which direction you’re headed.

P.S. If you enjoyed this article and know someone who might like it, please consider sharing it. Forwarding these emails is the primary way Z-Axis is able to grow.

P.P.S. If you aren’t yet a subscriber yourself, consider subscribing to this free weekly newsletter by clicking here.