I see this as PR speak for admitting that they can’t continue to achieve massive generational improvements in LLMs (let alone anything close to AGI) even with exponential increases in computing power. They are hitting a very costly brick wall.
Yep. AGI is still science fiction. Anyone telling you otherwise is probably just trying to fool investors. Ignore anyone who is less than three degrees of separation away from a marketing department.
The low-hanging fruit is quickly getting picked, so we’re bound to see a slowdown in advancement. And that’s a good thing. We don’t really need better language models at this point; we need better applications that use them.
The limiting factor is not so much hardware as it is our knowledge and competence in software architecture. As a historical example, 10 short years ago, computers were nowhere near top-level at Go. Then DeepMind developed AlphaGo, which was a huge leap forward and could beat a top pro. It ran on a supercomputer cluster. Thanks to the research breakthroughs around AlphaGo, within a few years had similar AI that could run on any smartphone and could beat any human player. It’s not because consumer hardware got that much faster; it’s because we learned how to make better software. Modern Go engines are a fraction of the size of AlphaGo, and generate similar or better quality results with a tiny fraction of the operations. And it seems like we’re pretty close to the limit now. A supercomputer can’t play all that much better than my laptop.
Similarly, a few years ago something like ChatGPT 3 needed a supercomputer. Now you can run a model with similar performance on a high-end phone, or a low-end laptop. Again, it’s not because hardware has improved; the difference is the software. My current laptop (2021 model) is older than ChatGPT 3 (publicly launched in 2022) and it can easily run superior models.
But the returns inevitably diminish. There’s a limit somewhere. It’s hard to say exactly where, but entropy’s gonna getcha sooner or later. You simply cannot fit more than 16GB of information in a 16GB model; you can only inch closer to that theoretical limit, and specialize into smaller scopes. At some point the world will realize that trying to encode everything into a model is a dumb idea. We already have better tools for that.
They didn’t have much more than that computing power in the first place.
This whole AI hype is the same as the “big data” hype, just with decorations. That we now have enormous data (people are so connected they are visible to the world in detail unachievable before), enormous network connectivity and enormous computing resources and enormous centralization, so let’s just combine these into something radically more powerful than what all these individuals with their boring science were trying to do without such awesome power. Only it’s not more efficient somehow and they still can’t understand why.
It’s a reactionary breed of technologies. In year 1824 an artisan couldn’t compete with a factory. But year 2024 is not like this. The radical breakthrough in architecture of digital economy is not a dumb repetition of the radical breakthrough of manufacturing economies back then. And it hasn’t yet happened.
And speaking of visions for future, in my humble opinion the initial intent behind Java and the initial intent behind Web and the initial intent behind Unix, and more recently peer-to-peer systems with smart contracts and distributed storage and services, and cryptographic identities, and other such things, are the direction in which that breakthrough is to be found. Not in mimicking some dumbed down explanations from school history lessons, which can’t possibly be complete enough.
I see this as PR speak for admitting that they can’t continue to achieve massive generational improvements in LLMs (let alone anything close to AGI) even with exponential increases in computing power. They are hitting a very costly brick wall.
Yep. AGI is still science fiction. Anyone telling you otherwise is probably just trying to fool investors. Ignore anyone who is less than three degrees of separation away from a marketing department.
The low-hanging fruit is quickly getting picked, so we’re bound to see a slowdown in advancement. And that’s a good thing. We don’t really need better language models at this point; we need better applications that use them.
The limiting factor is not so much hardware as it is our knowledge and competence in software architecture. As a historical example, 10 short years ago, computers were nowhere near top-level at Go. Then DeepMind developed AlphaGo, which was a huge leap forward and could beat a top pro. It ran on a supercomputer cluster. Thanks to the research breakthroughs around AlphaGo, within a few years had similar AI that could run on any smartphone and could beat any human player. It’s not because consumer hardware got that much faster; it’s because we learned how to make better software. Modern Go engines are a fraction of the size of AlphaGo, and generate similar or better quality results with a tiny fraction of the operations. And it seems like we’re pretty close to the limit now. A supercomputer can’t play all that much better than my laptop.
Similarly, a few years ago something like ChatGPT 3 needed a supercomputer. Now you can run a model with similar performance on a high-end phone, or a low-end laptop. Again, it’s not because hardware has improved; the difference is the software. My current laptop (2021 model) is older than ChatGPT 3 (publicly launched in 2022) and it can easily run superior models.
But the returns inevitably diminish. There’s a limit somewhere. It’s hard to say exactly where, but entropy’s gonna getcha sooner or later. You simply cannot fit more than 16GB of information in a 16GB model; you can only inch closer to that theoretical limit, and specialize into smaller scopes. At some point the world will realize that trying to encode everything into a model is a dumb idea. We already have better tools for that.
They didn’t have much more than that computing power in the first place.
This whole AI hype is the same as the “big data” hype, just with decorations. That we now have enormous data (people are so connected they are visible to the world in detail unachievable before), enormous network connectivity and enormous computing resources and enormous centralization, so let’s just combine these into something radically more powerful than what all these individuals with their boring science were trying to do without such awesome power. Only it’s not more efficient somehow and they still can’t understand why.
It’s a reactionary breed of technologies. In year 1824 an artisan couldn’t compete with a factory. But year 2024 is not like this. The radical breakthrough in architecture of digital economy is not a dumb repetition of the radical breakthrough of manufacturing economies back then. And it hasn’t yet happened.
And speaking of visions for future, in my humble opinion the initial intent behind Java and the initial intent behind Web and the initial intent behind Unix, and more recently peer-to-peer systems with smart contracts and distributed storage and services, and cryptographic identities, and other such things, are the direction in which that breakthrough is to be found. Not in mimicking some dumbed down explanations from school history lessons, which can’t possibly be complete enough.