Marble: A Multimodal World Model

(worldlabs.ai)

85 points | by meetpateltech 3 hours ago ago

12 comments

msteffen 26 minutes ago

I understand that DeepMind is working on this too: https://deepmind.google/blog/genie-3-a-new-frontier-for-worl...

I wonder how their approaches and results compare?

abixb 19 minutes ago

As someone with barebones understanding of "world models," how does this differ from sophisticated game engines that generate three-dimensional worlds? Is it simply the adaptation of transformer architecture in generating the 3-D world v/s using a static/predictable script as in game engines (learned dynamics vs deterministic simulation mimicking 'generation')? Would love an explanation from SMEs.

[-]

mountainriver 16 minutes ago

The model is predicting what the state of the world would look like after a given action.

Along with entertainment, they can be used for simulation training for robots. And allow for imagining potential trajectories

[-]

ghayes 10 minutes ago

Whenever I see these and play with models like this (and the demos on this page), the movement in the world always feel like a dolly zoom. Things in the distance tend to stay in the distance, even as the camera moves in that direction, and only the local area changes features.

[0] https://en.wikipedia.org/wiki/Dolly_zoom

abixb 12 minutes ago

Interesting. Given how LLMs can also "compute" the state after an action it might perform (once you feed it all the variables), are world models just a more visual heavy iteration of an LLM? Can Tesla's FSD (which uses transformer architecture) be considered a world model given the fact that it literally uses the IRL world (roads) as its input?

echelon 9 minutes ago

Marble is not that type of world model. It generates static Gaussian Splat assets that you can render using 3D libraries.

echelon 10 minutes ago

This "world model" is Image to Gaussian Splat. This is a static render that a web-based Gaussian Splat viewer then renders.

Other "world model"s are Image + (keyboard input) to Video or Streaming Images, that effectively function like a game engine / video hybrid.

girfan 34 minutes ago

This seems very interesting. Timely, given that Yann LeCun's vision also seems to align with world models being the next frontier: https://news.ycombinator.com/item?id=45897271

[-]

lofties 31 minutes ago

An established founder makes claims X is the new frontier. X receives hundreds of millions in funding. Other less established founders claim they are working on X too. VCs suffering from terminal FOMO pump billions more into X. X becomes the next frontier. The previous frontiers are promptly forgotten about.

keyle an hour ago

I'm floored. Incredible work.

also check out their interactive examples on the webapp. It's a bit more rough around the edges but shows real user input/output. Arguably such examples could be pushed further to better quality output.

e.g. https://marble.worldlabs.ai/world/b75af78a-b040-4415-9f42-6d...

e.g. https://marble.worldlabs.ai/world/cbd8d6fb-4511-4d2c-a941-f4...

hobofan 3 hours ago

Duplicate: https://news.ycombinator.com/item?id=45902732

cubefox 2 hours ago

Impressive!