There is obviously a difference between understanding what smarter people have already done, and actually innovating and building on what they have done. The latter requires a much deeper understanding.
It isn't actually an obvious limiting factor here. If I wanted to copy what, say, OpenAI has done the biggest stumbling block seems to be budget and GPU hours. It isn't obvious what the academic or engineering breakthroughs here really are. The maths does seem to be pretty trivial.
I wouldn't want to say it is easy, but this doesn't seem to be the work of geniuses as much as the work of people who work at large companies with lots of hardware.
The breakthrough isn't the matrix multiplication you'd learn in high school math. The breakthroughs are the likes of transformers, deep learning, etc.. In short the stuff that happens in a research lab, not what is thrown at you with flashy confetti by sales people.
I've read the transformers paper, it is still pretty simple math. And the architecture isn't particularly interesting from a theoretical sense, the concepts are basic - we're not talking Gauss, Newton or Euler here, we're talking 2-10% of software engineers [0]. The limit is most software engineers don't have enough hardware to test whether their ideas work or not.
And it is actually the same matrix multiplication people might learn at high school. The standard definition of matrix multiplication is generally accepted.
[0] We'd expect the figuring-things-out process to be faster for more cluey people, so the people who actually pick all this stuff up first probably are geniuses. But a more normal engineer with the hardware access would still figure it out independently sooner or later. It is just too neatly on the beaten path where we'd expect to be finding optimisations.
My point is that a discovery always seems simple and inevitable once it has happened. Newton's laws are pretty simple, all things considered, but it still took most of human history for those to be discovered.
An average high school student doing physics today could derive Newton's laws. That doesn't mean that that student put in the same situation as Newton, would have been capable of discovering newton's laws.
Newton had to invent calculus before his laws made sense. There isn't anything equivalent to that in modern machine learning. It uses concepts that are centuries old except for maybe a few stats techniques that might be around WWII era since there was a big stats push around then. Plus back-propagation which is a pretty big deal, but it is hardly new to the boom we're seeing right now. It is an idea that was sitting around for a while before hardware changes made it effective.
> There is obviously a difference between understanding what smarter people have already done, and actually innovating and building on what they have done. The latter requires a much deeper understanding.
Does it really?
When you build a DB do you really spend a lot of time thinking about set theory? How about 4th normal form? Or do you just do it?
Right now the thing that keeps those with an engineering mindset out side of the playground is COST. Get a 96gb GPU below 1000 bucks and they become toys, below 500 and a kid can get into playing with one (if that seems insane, it's the cost of a PS5).
You know what a 96gb GPU does... it opens up the idea of shoving small models into games. No more "I took an arrow to the knee" meme's because you could fill a world with characters in a story. It means running models at home makes sense.
The people working at removing "training" as a distinct step and making that an "ongoing" process (without it being lossy) are going to have a massive impact on the industry (more so in smaller models, we might get REAL agents). I would be willing to bet that it's an engineer that solves the issue and not someone with a math or ML background.
There are tons of ways the landscape will change and open up. In a few years that top 100 maybe one or two will be a a Dennis Ritchie or Michael Stonebraker.
>When you build a DB do you really spend a lot of time thinking about set theory? How about 4th normal form? Or do you just do it?
Are you talking about building a db (eg sqlite) or designing a database schema? 4NF seems only relevant to the latter. I'd certainly hope that the developers of sqlite consider set theory in designing their database implementation.
Designing a schema is the application. Still valuable, but not exactly a the frontier of database research and development, which was exactly my point.
Gradient descent is to AI, what a loop is to understand programming. However, understanding a loop doesn't mean you can program a full video game from scratch. Organizing hundreds of layers in an efficient way is pretty complex, even if the work today has been simplified thanks to PyTorch or Tensorflow, it remains pretty complicated. You have to understand how to organize your data, how to size your batches, how to make your code resilient enough to survive GPU cards crashing. Train a model over hundreds of GPU is really really complicated. New algorithms are proposed all the time, because we have no idea how to handle these interconnected layers in an efficient way, but with cumbersome heuristics. However, salary inflation is never a good thing, because it will create a gap between decent engineers and other people pronounced geniuses. The AI teams will suffer from these decisions... Badly. It will be like these samouraïs who would kill peasants after a battle to increase their head count, because this was how people were rewarded after a battle. Some of these people, in order to justify their salaries, will feel pressure to poach other people's ideas...
This feels like one of those bell-curve memes with the idiot and savant on the opposite sides saying “building LLMs is magical work undertaken by wizards”, where in the middle someone is saying “it’s just high school math lol”
https://archive.ph/Xp9cN
Is it that secret? I imagine you could get 90% the way there within a day or two with just a combo of highly cited papers + Linkedin scraping
"they all know each other"
Stop right there. In this industry, a clique of people who all know each other attest each other superhuman abilities? Unheard of!
Shouldn't "AI" write itself by now and make them redundant?
This WSJ article is like a Cisco article in 1999. Beware of the crash.
So… where is the list?
Interesting how DeepSeek didn’t need the list and just needed people to GSD instead of politics.
Might be worth legally changing your name to an entry on the list, just in case.
K
[flagged]
There is obviously a difference between understanding what smarter people have already done, and actually innovating and building on what they have done. The latter requires a much deeper understanding.
It isn't actually an obvious limiting factor here. If I wanted to copy what, say, OpenAI has done the biggest stumbling block seems to be budget and GPU hours. It isn't obvious what the academic or engineering breakthroughs here really are. The maths does seem to be pretty trivial.
I wouldn't want to say it is easy, but this doesn't seem to be the work of geniuses as much as the work of people who work at large companies with lots of hardware.
The breakthrough isn't the matrix multiplication you'd learn in high school math. The breakthroughs are the likes of transformers, deep learning, etc.. In short the stuff that happens in a research lab, not what is thrown at you with flashy confetti by sales people.
I've read the transformers paper, it is still pretty simple math. And the architecture isn't particularly interesting from a theoretical sense, the concepts are basic - we're not talking Gauss, Newton or Euler here, we're talking 2-10% of software engineers [0]. The limit is most software engineers don't have enough hardware to test whether their ideas work or not.
And it is actually the same matrix multiplication people might learn at high school. The standard definition of matrix multiplication is generally accepted.
[0] We'd expect the figuring-things-out process to be faster for more cluey people, so the people who actually pick all this stuff up first probably are geniuses. But a more normal engineer with the hardware access would still figure it out independently sooner or later. It is just too neatly on the beaten path where we'd expect to be finding optimisations.
My point is that a discovery always seems simple and inevitable once it has happened. Newton's laws are pretty simple, all things considered, but it still took most of human history for those to be discovered.
An average high school student doing physics today could derive Newton's laws. That doesn't mean that that student put in the same situation as Newton, would have been capable of discovering newton's laws.
Newton had to invent calculus before his laws made sense. There isn't anything equivalent to that in modern machine learning. It uses concepts that are centuries old except for maybe a few stats techniques that might be around WWII era since there was a big stats push around then. Plus back-propagation which is a pretty big deal, but it is hardly new to the boom we're seeing right now. It is an idea that was sitting around for a while before hardware changes made it effective.
But "invention" here is really different. They are basically trying out stuff and seeing what works, but they have no idea how it actually works.
We're essentially watching a bunch of investor-backed alchemists here.
> There is obviously a difference between understanding what smarter people have already done, and actually innovating and building on what they have done. The latter requires a much deeper understanding.
Does it really?
When you build a DB do you really spend a lot of time thinking about set theory? How about 4th normal form? Or do you just do it?
Right now the thing that keeps those with an engineering mindset out side of the playground is COST. Get a 96gb GPU below 1000 bucks and they become toys, below 500 and a kid can get into playing with one (if that seems insane, it's the cost of a PS5).
You know what a 96gb GPU does... it opens up the idea of shoving small models into games. No more "I took an arrow to the knee" meme's because you could fill a world with characters in a story. It means running models at home makes sense.
The people working at removing "training" as a distinct step and making that an "ongoing" process (without it being lossy) are going to have a massive impact on the industry (more so in smaller models, we might get REAL agents). I would be willing to bet that it's an engineer that solves the issue and not someone with a math or ML background.
There are tons of ways the landscape will change and open up. In a few years that top 100 maybe one or two will be a a Dennis Ritchie or Michael Stonebraker.
>When you build a DB do you really spend a lot of time thinking about set theory? How about 4th normal form? Or do you just do it?
Are you talking about building a db (eg sqlite) or designing a database schema? 4NF seems only relevant to the latter. I'd certainly hope that the developers of sqlite consider set theory in designing their database implementation.
Designing a schema is the application. Still valuable, but not exactly a the frontier of database research and development, which was exactly my point.
Gradient descent is to AI, what a loop is to understand programming. However, understanding a loop doesn't mean you can program a full video game from scratch. Organizing hundreds of layers in an efficient way is pretty complex, even if the work today has been simplified thanks to PyTorch or Tensorflow, it remains pretty complicated. You have to understand how to organize your data, how to size your batches, how to make your code resilient enough to survive GPU cards crashing. Train a model over hundreds of GPU is really really complicated. New algorithms are proposed all the time, because we have no idea how to handle these interconnected layers in an efficient way, but with cumbersome heuristics. However, salary inflation is never a good thing, because it will create a gap between decent engineers and other people pronounced geniuses. The AI teams will suffer from these decisions... Badly. It will be like these samouraïs who would kill peasants after a battle to increase their head count, because this was how people were rewarded after a battle. Some of these people, in order to justify their salaries, will feel pressure to poach other people's ideas...
I'm curious about that syllabus.
This feels like one of those bell-curve memes with the idiot and savant on the opposite sides saying “building LLMs is magical work undertaken by wizards”, where in the middle someone is saying “it’s just high school math lol”
Assuming I'm on the left side of the meme, isn't the "high school math" on the outsides of the meme, and the "magical wizards" in the middle?
Both work. As does the extended gigachad further on the right saying either.
[dead]
[dead]