As someone with barebones understanding of "world models," how does this differ from sophisticated game engines that generate three-dimensional worlds? Is it simply the adaptation of transformer architecture in generating the 3-D world v/s using a static/predictable script as in game engines (learned dynamics vs deterministic simulation mimicking 'generation')? Would love an explanation from SMEs.
Whenever I see these and play with models like this (and the demos on this page), the movement in the world always feel like a dolly zoom. Things in the distance tend to stay in the distance, even as the camera moves in that direction, and only the local area changes features.
Interesting. Given how LLMs can also "compute" the state after an action it might perform (once you feed it all the variables), are world models just a more visual heavy iteration of an LLM? Can Tesla's FSD (which uses transformer architecture) be considered a world model given the fact that it literally uses the IRL world (roads) as its input?
An established founder makes claims X is the new frontier. X receives hundreds of millions in funding. Other less established founders claim they are working on X too. VCs suffering from terminal FOMO pump billions more into X. X becomes the next frontier. The previous frontiers are promptly forgotten about.
also check out their interactive examples on the webapp. It's a bit more rough around the edges but shows real user input/output. Arguably such examples could be pushed further to better quality output.
I understand that DeepMind is working on this too: https://deepmind.google/blog/genie-3-a-new-frontier-for-worl...
I wonder how their approaches and results compare?
As someone with barebones understanding of "world models," how does this differ from sophisticated game engines that generate three-dimensional worlds? Is it simply the adaptation of transformer architecture in generating the 3-D world v/s using a static/predictable script as in game engines (learned dynamics vs deterministic simulation mimicking 'generation')? Would love an explanation from SMEs.
The model is predicting what the state of the world would look like after a given action.
Along with entertainment, they can be used for simulation training for robots. And allow for imagining potential trajectories
Whenever I see these and play with models like this (and the demos on this page), the movement in the world always feel like a dolly zoom. Things in the distance tend to stay in the distance, even as the camera moves in that direction, and only the local area changes features.
[0] https://en.wikipedia.org/wiki/Dolly_zoom
Interesting. Given how LLMs can also "compute" the state after an action it might perform (once you feed it all the variables), are world models just a more visual heavy iteration of an LLM? Can Tesla's FSD (which uses transformer architecture) be considered a world model given the fact that it literally uses the IRL world (roads) as its input?
Marble is not that type of world model. It generates static Gaussian Splat assets that you can render using 3D libraries.
This "world model" is Image to Gaussian Splat. This is a static render that a web-based Gaussian Splat viewer then renders.
Other "world model"s are Image + (keyboard input) to Video or Streaming Images, that effectively function like a game engine / video hybrid.
This seems very interesting. Timely, given that Yann LeCun's vision also seems to align with world models being the next frontier: https://news.ycombinator.com/item?id=45897271
An established founder makes claims X is the new frontier. X receives hundreds of millions in funding. Other less established founders claim they are working on X too. VCs suffering from terminal FOMO pump billions more into X. X becomes the next frontier. The previous frontiers are promptly forgotten about.
I'm floored. Incredible work.
also check out their interactive examples on the webapp. It's a bit more rough around the edges but shows real user input/output. Arguably such examples could be pushed further to better quality output.
e.g. https://marble.worldlabs.ai/world/b75af78a-b040-4415-9f42-6d...
e.g. https://marble.worldlabs.ai/world/cbd8d6fb-4511-4d2c-a941-f4...
Duplicate: https://news.ycombinator.com/item?id=45902732
Impressive!