Yeah, so many people are confidently stating “LLMs can’t think like humans do!” When we’re actually still pretty unclear on how humans think.
Sure, an LLM on its own may not be an AGI. But they’re remarkably closer than we would have predicted they could get just a few years ago, and it may well be that we just need to add a bit more “special sauce” (memory, prompting strategies, perhaps a couple of parallel LLMs that specialize in different types of reasoning) to get them over the hump. At this point a lot of the research isn’t going into simply “make it bigger!”, it’s going into “use LLMs smarter.”
The brain is stacks on stacks of insanely complicated systems. The fact that we know a ridiculous amount about the brain and are barely scratching the surface is exactly the point.
By that measure, we know everything about GPT-2, but again are just scratching the surface of how it works. I don’t think you can draw the conclusion that LLMs can never be intelligent just from that.
We “know everything about it” because it’s not that complicated.
You don’t need to process every individual step a search algorithm has to understand how it works. LLMs are the same thing. They’re just a big box of weighted probabilities. Complexity is more than just having a really big model.
We have bits and pieces of a lot of parts, but are nowhere near a complete understanding of any of them. We kind of know how neurotransmitters work, we kind of know how hormones work and interact with those neurotransmitters, we mostly know how individual neurons fire, we kind of know what different parts of the brain do, we kind of know how the brain adapts to physical damage…
We don’t know any of the algorithms it follows. What we do know that it’s a hell of a lot of interconnected parts, and they’re all following very different rules.
It’s not a search algorithm. If it is, that’s an overfitted model, and it’s detected and rejected. What a good foundation model is doing is just about as mysterious as the brain.
It’s fundamentally extremely comparable mathematically and algorithmically. That’s the point. Simulated annealing doesn’t need to understand the search space to find a pretty good answer to a problem. It just needs to know what a good answer approximately looks like and nudge potential answers closer that way.
What LLMs are doing is not mysterious at all. Why a specific point in a model is what it is is, but there’s no mystery to the algorithm. We can’t even guess at most of the algorithms that make up the brain.
Simulated annealing is a search algorithm which finds a solution.
Backpropagation is a search algorithm which finds a function, which in a big enough network could be literally any of them that are computable. Once the network is trained and rolls out for consumers, backpropagation isn’t used at all.
Those are two fundamentally different things. GPT-2 is trained, and is no longer a search algorithm by any useful definition.
There’s examples of small neural nets we can understand, and they’re not doing search algorithms; Quanta did a story about some just last week. If you can do simulated annealing you should probably just look into NN algorithms in detail yourself, because then you can know how that’s wrong without the internet’s help.
I’m not calling it a search algorithm. I’m saying they all do the same math, and doing the math with more parallelism and variables doesn’t make what it is a mystery.
Search algorithms searching for functions isn’t new. Not knowing what each parameter corresponds to because you made your model huge doesn’t make LLMs a mystery. It’s still functionally one part. The hormone system is as complex as LLMs. Regulation of neurotransmitters is as complex as LLMs. Ignoring those external factors that are critical to how it works, individual portions of the brain are more complex than LLMs, then are all interconnected on top of that.
I fully believe we’ll get to AGI eventually (probably not before we understand the brain a lot better), but the idea that one pretty simple algorithm is going to get us there is crazy. Human intelligence is a system of disparate systems of disparate systems at minimum.
How complex is intelligence, though? People who were sure they don’t were drawing from information we don’t actually have.
Yeah, so many people are confidently stating “LLMs can’t think like humans do!” When we’re actually still pretty unclear on how humans think.
Sure, an LLM on its own may not be an AGI. But they’re remarkably closer than we would have predicted they could get just a few years ago, and it may well be that we just need to add a bit more “special sauce” (memory, prompting strategies, perhaps a couple of parallel LLMs that specialize in different types of reasoning) to get them over the hump. At this point a lot of the research isn’t going into simply “make it bigger!”, it’s going into “use LLMs smarter.”
deleted by creator
Obscenely.
The brain is stacks on stacks of insanely complicated systems. The fact that we know a ridiculous amount about the brain and are barely scratching the surface is exactly the point.
By that measure, we know everything about GPT-2, but again are just scratching the surface of how it works. I don’t think you can draw the conclusion that LLMs can never be intelligent just from that.
We “know everything about it” because it’s not that complicated.
You don’t need to process every individual step a search algorithm has to understand how it works. LLMs are the same thing. They’re just a big box of weighted probabilities. Complexity is more than just having a really big model.
We have bits and pieces of a lot of parts, but are nowhere near a complete understanding of any of them. We kind of know how neurotransmitters work, we kind of know how hormones work and interact with those neurotransmitters, we mostly know how individual neurons fire, we kind of know what different parts of the brain do, we kind of know how the brain adapts to physical damage…
We don’t know any of the algorithms it follows. What we do know that it’s a hell of a lot of interconnected parts, and they’re all following very different rules.
It’s not a search algorithm. If it is, that’s an overfitted model, and it’s detected and rejected. What a good foundation model is doing is just about as mysterious as the brain.
It’s fundamentally extremely comparable mathematically and algorithmically. That’s the point. Simulated annealing doesn’t need to understand the search space to find a pretty good answer to a problem. It just needs to know what a good answer approximately looks like and nudge potential answers closer that way.
What LLMs are doing is not mysterious at all. Why a specific point in a model is what it is is, but there’s no mystery to the algorithm. We can’t even guess at most of the algorithms that make up the brain.
Simulated annealing is a search algorithm which finds a solution.
Backpropagation is a search algorithm which finds a function, which in a big enough network could be literally any of them that are computable. Once the network is trained and rolls out for consumers, backpropagation isn’t used at all.
Those are two fundamentally different things. GPT-2 is trained, and is no longer a search algorithm by any useful definition. There’s examples of small neural nets we can understand, and they’re not doing search algorithms; Quanta did a story about some just last week. If you can do simulated annealing you should probably just look into NN algorithms in detail yourself, because then you can know how that’s wrong without the internet’s help.
I’m not calling it a search algorithm. I’m saying they all do the same math, and doing the math with more parallelism and variables doesn’t make what it is a mystery.
Search algorithms searching for functions isn’t new. Not knowing what each parameter corresponds to because you made your model huge doesn’t make LLMs a mystery. It’s still functionally one part. The hormone system is as complex as LLMs. Regulation of neurotransmitters is as complex as LLMs. Ignoring those external factors that are critical to how it works, individual portions of the brain are more complex than LLMs, then are all interconnected on top of that.
I fully believe we’ll get to AGI eventually (probably not before we understand the brain a lot better), but the idea that one pretty simple algorithm is going to get us there is crazy. Human intelligence is a system of disparate systems of disparate systems at minimum.
So does having more parts make something a mystery, like the second paragraph, or not a mystery like the first?
I was a skeptic back in the day too, but they’ve already far exceeded what an algorithm I could write from memory seems like it should be able to do.