108 comments

  • DeborahEmeni_ an hour ago

    The “holding a button” thing actually resonated. It feels like the real work here is engineering the reward structure to make exploration even remotely viable. Dreamer’s world model might be cool, but most of the heavy lifting still seems to come from how forgiving the Minecraft environment is for training.

    I do wonder though: if you swapped Minecraft for a cloud-based synthetic world with similar physics but messier signals, like object permanence or social reasoning, would Dreamer still hold up? Or is it just really good at the kind of clean reward hierarchies that games offer?

  • suddenlybananas 14 hours ago

    An important caveat from the paper

    >Moreover, we follow previous work in accelerating block breaking because learning to hold a button for hundreds of consecutive steps would be infeasible for stochastic policies, allowing us to focus on the essential challenges inherent in Minecraft.

    • o11c an hour ago

      And a relevant piece of ancient wisdom (exact date not known, but presumably before 1970):

      > In the days when Sussman was a novice, Minsky once came to him as he sat hacking at the PDP-6.

      > “What are you doing?”, asked Minsky.

      > “I am training a randomly wired neural net to play Tic-Tac-Toe” Sussman replied.

      > “Why is the net wired randomly?”, asked Minsky.

      > “I do not want it to have any preconceptions of how to play”, Sussman said.

      > Minsky then shut his eyes.

      > “Why do you close your eyes?”, Sussman asked his teacher.

      > “So that the room will be empty.”

      > At that moment, Sussman was enlightened.

    • toxik 11 hours ago

      Like all things RL, it is 99.9% about engineering the environment and rewards. As one of the authors stated elsewhere here, there is a reward for completing each of 12 steps necessary to find diamonds.

      Mostly I'm tired of RL work being oversold by its authors and proponents by anthropomorphizing its behaviors. All while this "agent" cannot reliably learn to hold down a button, literally the most basic interaction of the game.

      • red75prime 11 hours ago

        The "no free lunch" theorem. You can't start from scratch and expect your program to repeat 4 billion years of evolution collecting inductive biases useful in our corner of our Universe in a matter of hours[1].

        While it's possible to bake in this particular inductive bias (repetitive actions might be useful), they decided not to (it's just not that interesting).

        [1] And you certainly can't reproduce the observation selection effect in a laboratory. That is the thing that makes it possible to overcome the "no free lunch" theorem: our existence and intelligence are conditional on evolution being possible and finding the right biases.

        We have to bake in inductive biases to get results. We have to incentivize behaviors useful (or interesting) to us to get useful results instead of generic exploration.

        • toxik 7 hours ago

          You don't have to repeat 4 billion years of evolution, an RL agent lives inside a strange universe where the basic axioms happen to be exactly aligned with what you can do in that universe.

          Its actions are not muscular, they are literal gameplay actions. It is orders of magnitude easier to learn that the same action should be performed until completion, than that the finger should be pressed against a surface while the hand is stabilized with respect to the cursor on a screen.

          One of the most interesting (and pathological) things about humans is that we learn what is rewarding. Not how to get a reward, but actually we train ourselves to be rewarded by doing difficult/novel/funny/etc things. Notably this is communicated largely by being social, i.e., we feel reward for doing something difficult because other people are impressed by that.

          In Castaway, Hanks' only companion is a mute, deflated ball, but nonetheless he must keep that relationship alive---to keep himself alive. The climax of the movie is when Hanks returns home and people are so impressed, his efforts are validated.

          Contrast that to RL, there is no intrinsic motivation. The agents do not play, or meaningfully explore, really. The extent of its exploration is a nervous tic that makes it press the wrong button with probability ε. The reason it cannot hold down buttons is because it explores by having Parkinson's disease, by accident, not because it thought it might find out something useful/novel/funny/etc. In fact, it can't even have a definition of those words, because they are defined in the space between beings.

          • orbifold 3 hours ago

            Personally I am almost certain that the current framing of RL and its relationship to animal behavior is deeply misguided. It proves close to impossible to train animals using this paradigm (not for a lack of trying), i.e. animals such as mice only make any progress when water deprived and under conditions that exploit their natural instincts. Nevertheless they are capable of far more complex natural behaviors. There is a non-zero chance that RL as an explanation of animal behavior is just plain wrong or not applicable.

        • d0mine 3 hours ago

          Isn’t it exactly what alphazero did?

          “AlphaZero was trained solely via self-play using 5,000 first-generation TPUs to generate the games and 64 second-generation TPUs to train the neural networks, all in parallel, with no access to opening books or endgame tables. After four hours of training, DeepMind estimated AlphaZero was playing chess at a higher Elo rating than Stockfish 8; after nine hours of training, the algorithm defeated Stockfish 8 in a time-controlled 100-game tournament (28 wins, 0 losses, and 72 draws).” [emphasis added] https://en.wikipedia.org/wiki/AlphaZero

          • red75prime 2 hours ago

            I thought that it might be a rare chance to invoke the NFL theorem appropriately, but I guess I was wrong. The NFL talks about a uniform distribution of problems. A case that is probably never the case. At least for habitable universes.

            Nevertheless, the theorem basically states that there are games where AlphaZero will be beaten by another algorithm. Even if those games are nonsensical from our point of view.

            • Xcelerate an hour ago

              > I thought that it might be a rare chance to invoke the NFL theorem appropriately, but I guess I was wrong

              Haha, I wouldn’t feel bad. It’s one of the most misunderstood theorems, and I don’t think I’ve ever seen it invoked correctly on a message board.

        • rebeccaskinner 5 hours ago

          > While it's possible to bake in this particular inductive bias (repetitive actions might be useful), they decided not to (it's just not that interesting).

          What's interesting to me about this is that the problem seems really aligned with the research they are doing. From what I can tell, they build a system where the agent has a simplified "mental" model of the game world and it uses to predict actions that will lead to better rewards.

          I don't think what's missing here is teaching the model that it should just try to do things a lot until they succeed. Instead, what I think is missing is the context that it's playing a game, and what that means.

          For example, any human player who sits down to play minecraft is likely to hold down the button to mine something. Younger children might also hold the jump button down and jump around aimlessly, but older children and adults probably wouldn't. Why? I suspect it's because people with experience in video games have set expectations for how game designers communicate the gameplay experience. We understand that clicking on things to interact with them is a common mode of interaction, and we expect that games have upgrade mechanics that will let us work faster or interact with high level items. It's not that we repeat any action arbitrarily to see that it pays off, but rather that we're speaking a language of games and modeling the mind of the game designers and anticipating what they expect from us.

          I would think that trying to expand the model of the world to include this notion of the language of games might be a better approach to overcoming the limitation instead of just hard-coding the model to try things over and over again to see if there's a payoff.

        • kypro 10 hours ago

          > You can't start from scratch and expect your program to repeat 4 billion years of evolution collecting inductive biases useful in our corner of our Universe in a matter of hours

          Really? Minecraft's gameplay dynamic are not particularly complex... The AI here isn't learning highly complex rules about the nuances of human interaction or learning to detect the relatively subtle differences between various four legged creatures based on small differences in body morphology. In these cases I could see how millions of years of evolution is important to at least give us and other animals a head start when entering the world. If the AI had to do something like this to progress in Minecraft then I'd get why learning those complexities would be skipped over.

          But in this case a human would quickly understand that holding a button creates a state which tapping a button does not, and therefore would assume this state could be useful to explore further states. Identifying this doesn't seem particularly complex to me. If the argument is that it will take slightly longer for an AI to learn patterns in dependant states then okay, sure, but I think arguing that learning that holding a button creates a new state is such a complex problem that we couldn't possibly expect an AI to learn it from scratch within a short timeframe is a very weak argument. It's just not that complex. To me this suggests that current algorithms are lacking.

          • blueflow 9 hours ago

            It seems easy to you because you can't remember the years when you were a toddler and had to learn basic interactions with the world around you. It seems natural to an adult but it is quite complex.

            • geysersam 7 hours ago

              But this argument applies just as well to tons of other tasks AIs can handle just fine. So it doesn't explain why this particular action is so much harder compared to anything else.

              • blueflow 7 hours ago

                Which tasks?

                • geysersam 2 hours ago

                  > basic interactions with the world around you, tasks that seem easy to us but are actually quite complex

                  Tasks such as:

                    - recognizing objects in our surroundings,
                    - speaking,
                    - reasoning about other people's thoughts and feelings,
                    - playing go?
                  
                  All of those were at some point "easy for us but very hard for computer programs".
              • SkyBelow 6 hours ago

                In particular, the task requires understanding that one can impact the world through action. This is learned by humans through a constant feedback loop running for months to a year+. The very way we train AIs doesn't seem to teach this agency, only teach the ability to mimic having that agency in ways that we can capture data for (such as online discussions). Will that training eventually give rise to such agency? I'm doubtful with most current models given that the learning process is so disconnected from the execution and that execution is prompted and not inherently on going. Maybe some agent swarm that is always running and always training and upgrading its members could achieve that level of agency, which is why I'm not saying it is impossible, but I expect we are going to have to wait for some newer model that is always running and which is training as it is running to see true agency develop.

                Until then, it is a question of if we can capture the appearance of agency in the training set well enough for learn it with training and not depend upon interactions to learn more.

            • kypro 7 hours ago

              I don't think I am, and for context here I have built my own DQNs from scratch to learn to play games like Snake.

              I'd argue if you consider the size of the input and output space here it's not as complex you're implying.

              To refer back to my example, to tell the difference between four legged creatures is complicated because there's a huge number of possible outputs and the visual input space is both large and complex. Learning how to detect patterns in raw image data is complicated and is why we and other animals are preloaded with the neurological structures to do this. It's also why we often use pretrained models when training models to label new outputs – simply learning how detect simple patterns in visual data is difficult enough so if this step can be skipped it often makes sense to skip it.

              In constrast the inputs to Minecraft are relatively very simple – you have a handful of buttons which can be pressed and those buttons can be pressed for different durations. Similarly the output space here while large is relatively simple and presumably simply detecting that an action like holding a button results in a state change shouldn't be that complex to learn... I mean it's already learning that pressing a button results in a state change so I think you'd need to explain to me why adding a tiny bit of additional complexity here is so unreasonable. Maybe I'm missing something.

              • red75prime 5 hours ago

                > I think you'd need to explain to me why adding a tiny bit of additional complexity here is so unreasonable

                As far as I understand DreamerV3 doesn't employ intrinsic rewards (like in novelty-based exploration). It adopts stochastic exploration which makes it practically impossible to get to rewards that require to consistently repeat an action with no intermediate rewards.

                And finding intrinsic rewards that work good across diverse domains is a complex problem in itself.

              • blueflow 6 hours ago

                Example: When humans play Minecraft, they already know object permanence from the real world. I did not see anywhere that AI got trained to learn object permanence. Yet it is required for basics like searching for your mineshaft after turning around.

          • red75prime 8 hours ago

            > Minecraft's gameplay dynamic are not particularly complex...

            I think you underestimate complexity of going from 12288+400 changing numbers to a concept of gameplay dynamics in the first place. Or in other words your complexity prior is biased by experience.

      • LPisGood 6 hours ago

        When I was a child and first played Minecraft I clicked instead of held and after 10 minutes I gave up, deciding that Minecraft was too hard.

        • zvitiate 6 hours ago

          What if you were in an environment where you had to play Minecraft for say, an hour. Do you think your child brain would've eventually tried enough things (or had your finger slip and stay on the mouse a little extra while), noticed that hitting a block caused an animation, (maybe even connect it with the fact that your cursor highlights individual blocks with a black box,) decide to explore that further, and eventually mine a block? Your example doesn't speak to this situation at all.

        • daedrdev 5 hours ago

          I had the same problem, learned from a roblox mining game where mining a block required clicking it a bunch of times.

      • freeone3000 5 hours ago

        RL is useful for action selection and planning. Actually determining the mechanics of the game can be achieved with explicit instruction and definition of an action set.

        I suppose whether you find this result intriguing or not depends on if you’re looking to build result-building planning agents over an indeterminate (and sizable!) time horizon, in which case this is a SOTA improvement and moderately cool, or if you’re looking for a god in the machine, which this is not.

      • SpaceManNabs 4 hours ago

        If you have an alternative for RL in these use cases, please feel free to share.

        When RL works, it really works.

        The only alternative I have seen is deep networks with MCTS, and they are quickly to ramp up to decent quality. But they hit caps relatively quickly.

    • kharak 12 hours ago

      In my mind, this generalizes to the same problem with other non-stochastic (deterministic) operations like logical conclusions (A => B) .

      I have a running bet with friend that humans encode deterministic operations in neural networks, too, while he thinks there has to be another process at play. But there might be something extra helping our neural networks learn the strong weights required for it. Or the answer is again: "more data".

    • lgeorget 9 hours ago

      Well, to be fair... I (a human) had to look it up online the first time I played as well. I was repeatedly clicking on the same tree for an entire minute before that. I even tried several different trees just in case.

      • fusionadvocate 7 hours ago

        But it is possible to discover by holding down the button and realizing the block is getting progressively more "scratched".

    • JohnKemeny 14 hours ago

      I'm not sure it's a serious caveat if the "hint" or "control" is in the manual.

      • suddenlybananas 13 hours ago

        Sorry, I don't quite follow what you mean?

        • franktankbank 10 hours ago

          I didn't read the manual and when I was trying to help my kid play the game I couldn't figure out how to break blocks.

    • Hamuko 11 hours ago

      Turns out that AI are much better at playing video games if they're allowed to cheat.

    • FrustratedMonky 8 hours ago

      "accelerating block breaking because learning to hold a button for hundreds of consecutive steps "

      This is fine, and does not impact the importance of figuring out the steps.

      Anybody that has done any tuning on systems that run at different speeds, the adjusting for the speed difference is just engineering, and allows you to get on with more important/inventive work.

    • thesz 14 hours ago

      "It allows AI to understand its physical environment and also to self-improve over time, without a human having to tell it exactly what to do."

      • ks1723 12 hours ago

        I my view, the 'exactly' is crucial here. They do implicitly tell the model what to do by encoding it in the reward function:

        In Minecraft, the team used a protocol that gave Dreamer a ‘plus one’ reward every time it completed one of 12 progressive steps involved in diamond collection — including creating planks and a furnace, mining iron and forging an iron pickaxe.

        This is also why I think the title of the article is slightly misleading.

        • wongarsu 3 hours ago

          It's kind of fair, humans also get rewarded for those steps when they learn Minecraft

  • YeGoblynQueenne 5 hours ago

    Reinforcement learning is very good with games.

    >> In Minecraft, the team used a protocol that gave Dreamer a ‘plus one’ reward every time it completed one of 12 progressive steps involved in diamond collection — including creating planks and a furnace, mining iron and forging an iron pickaxe.

    And that is why it is never going to work in the real world: games have clear objectives with obvious rewards. The real world, not so much.

    • SpaceManNabs 4 hours ago

      > And that is why it is never going to work in the real world: games have clear objectives with obvious rewards. The real world, not so much.

      I encourage you to read deepmind's work with robots.

    • smokel 5 hours ago

      > games have clear objectives with obvious rewards. The real world, not so much.

      Tell that to the people here who are trying to turn their startup ideas into money.

      • zamadatix 4 hours ago

        I don't think folks go the startup path because the steps to go from idea to making money are obvious and clear.

  • Animats 14 hours ago

    Key to Dreamer’s success, says Hafner, is that it builds a model of its surroundings and uses this ‘world model’ to ‘imagine’ future scenarios and guide decision-making.

    Can you look at the world model, like you can look at Waymo's world model? Or is it hidden inside weights?

    Machine learning with world models is very interesting, and the people doing it don't seem to say much about what the models look like. The Google manipulation work talks endlessly about the natural language user interface, but when they get to motion planning, they don't say much.

    • danijar 12 hours ago

      Yes, you can decode the imagined scenarios into videos and look at them. It's quite helpful during development to see what the model gets right or wrong. See Fig. 3 in the paper: https://www.nature.com/articles/s41586-025-08744-2

    • lnsru 14 hours ago

      I implemented an acoustic segmentation system in FPGA recently. The whole world model was a long list of known events and states with feasible transitions. Plus novel things not observed before. Basically rather dumb state machine with machine learning part attached to acoustic sensors. Of course, both parts could be hidden behind weights. But state machine was easily readable and that was the biggest advantage of it.

      • mnky9800n 13 hours ago

        Why would an accounting system need acoustic sensors?

        • lnsru 13 hours ago

          Sorry. Terrible typo. Acoustic system was cheap though.

          • mnky9800n 3 hours ago

            Oh haha. I work on an acoustic detection project so I was quite excited about new applications.

            How exactly does your machine learning model work?

    • jtsaw 12 hours ago

      I’d say it’s more like Waymo’s world model. The main actor uses a latent vector representation of the state of the game to make decisions. This latent vector at train time is meant to compress a bunch of useful information about the game. So while you can’t really understand the actual latent vector that represents state, you do know it encodes at least the state of the game.

      This world model stuff is only possible in environments that are sandboxed. Ie you can represent the state of the world in an and have a way of producing the next state given a current state and action. Things like Atari games, robot simulations, etc

    • TeMPOraL 13 hours ago

      > Can you look at the world model, like you can look at Waymo's world model? Or is it hidden inside weights?

      I imagine it's the latter, and in general, we're already dealing with plenty of models with world models hidden inside their weights. That's why I'm happy to see the direction Anthropic has been taking with their interpretability research over the years.

      Their papers, as well as most discussions around them, focus on issues of alignment/control, safety, and generally killing the "stochastic parrot" meme and keeping it dead - but I think it'll be even more interesting to see attempts at mapping how those large models structure their world models. I believe there's scientific and philosophical discoveries to be made in answering why these structures look the way they do.

      • namaria 12 hours ago

        > killing the "stochastic parrot" meme

        This was clearly the goal of the "Biology of LLMs" (and ancillary) paper but I am not convinced.

        They used a 'replacement model' that by their own admission could match the output of the LLM ~50% of the time, and the attribution of cognition related labels to the model hinges entirely on the interpretation of the 'activations' seen in the replacement model.

        So they created a much simpler model, that sorta kinda can do what the LLM can do in some instances, contrived some examples, observed the replacement model and labeled what it was doing very liberally.

        Machine learning and the mathematics involved is quite interesting but I don't see the need to attribute neuroscience/psychology related terms to them. They are fascinating in their own terms and modelling language can clearly be quite powerful.

        But thinking that they can follow instructions and reason is the source of much misdirection. The limits of this approach should make clear that feeding text to a text continuation program should not lead to parsing the generated text for commands and running these commands, because the tokens the model outputs are just statistically linked to the tokens inputted to them. And as the model takes more tokens from the wild, it can easily lead to situations that are very clearly an enormous risk. Pushing the idea that they are reasoning about the input is driving all sorts of applications that seeing them as statistical text continuation programs would make clear are a glaring risk.

        Machine learning and LLMs are interesting technology that should be investigated and developed. Reasoning by induction that they are capable of more than modelling language is bad science and drives bad engineering.

  • reportgunner 13 hours ago

    Article makes it seem like finding diamonds is some kind of super complicated logical puzzle. In reality the hardest part is knowing where to look for them and what tool you need to mine them without losing them once you find them. This was given to the AI by having it watch a video that explains it.

    If you watch a guide on how to find diamonds it's really just a matter of getting an iron pickaxe, digging to the right depth and strip mining until you find some.

    • danijar 12 hours ago

      Hi, author here! Dreamer learns to find diamonds from scratch by interacting with the environment, without access to external data. So there are no explainer videos or internet text here.

      It gets a sparse reward of +1 for each of the 12 items that lead to the diamond, so there is a lot it needs to discover by itself. Fig. 5 in the paper shows the progression: https://www.nature.com/articles/s41586-025-08744-2

      • itchyjunk 11 hours ago

        Since diamonds are surrounded by danger and if it dies, it loses its items and such, why would it not be satisfied after discovering iron pick axe or somesuch? Is it in a mode where it doesn't lose its item when it dies? Does it die a lot? Does it ever try digging vertically down? Does it ever discover other items/tools you didn't expect it to? Open world with sparse reward seems like such a hard problem. Also, once it gets the item, does it stop getting reward for it? I assume so. Surprised that it can work with this level of sparse rewards.

        • taneq 9 hours ago

          In all reinforcement learning there is (explicitly as part of a fitness function, or implicitly as part of the algorithm) some impetus for exploration. It might be adding a tiny reward per square walked, a small reward for each block broken and a larger one for each new block type broken. Or it could be just forcing a random move every N steps so the agent encounters new situations through “clumsiness”.

          • kevindamm an hour ago

            That is right, there is usually a parameter on the action selection function -- the exploitation vs exploration balance.

      • SpaceManNabs 4 hours ago

        I just want to express my condolences in how difficult it must be to correct basic misunderstandings that can be immediately corrected from reading the fourth paragraph under the section "Diamonds are forever"

        Thanks for your hard work.

    • kuu 13 hours ago

      While I agree with your comment, this sentence:

      "This was given to the AI by having it watch a video that explains it."

      This was not as trivial as it may seem just a few months ago...

      • rcxdude 13 hours ago

        EDIT: Incorrect, see below

        it didn't watch 'a video', it watched many, many hours of video of playing minecraft (with another specialised model feeding in predictions of keyboard and mouse inputs from the video). It's still a neat trick, but it's far from the implied one-shot learning.

        • danielbln 12 hours ago

          The author replied in this thread and says the opposite.

          • rcxdude 11 hours ago

            Ah, I was incorrect. I got that impression from one of the papers linked at the end of the article, but I suspect that's actually some previous work.

      • NVHacker 13 hours ago

        Alpha Star was also trained initially from youtube videos of pros playing Starcraft. I would argue that it was pretty trivial a few years ago.

        • rcxdude 13 hours ago

          I don't think it was videos. Almost certainly it was replay files with a bunch of work to transform them into something that could be compared to the model's outputs. (Alphastar never 'sees' the game's interface, only a transformed version of information available via an API)

          • stingraycharles 12 hours ago

            This was my understanding as well, as the replay files are all available anyway.

            The YouTube documentary is actually very detailed about how they implemented everything.

        • ismailmaj 13 hours ago

          Do you know if it was actual videos or some simpler inputs like game state and user inputs? I’d be impressed if it was the former at that time.

          • johnny22 11 hours ago

            starcraft provides replay files that start with the initial game state and then every action in the game. Not user inputs, but the actions bound to them.

    • skwirl 12 hours ago

      >This was given to the AI by having it watch a video that explains it.

      That is not what the article says. It says that was separate, previous research.

    • Bluglionio 13 hours ago

      I don't get it. How can you reduce this achievement down to this?

      Have you gotten used to some ai watching a video and 'getting it' so fast that this is boring? Unimpressive?

      • jerf 7 hours ago

        The other replies have observed that the AI didn't get any "videos to watch" but I'd also observe that this is being used as an English colloquialism. The AIs aren't "watching videos", they're receiving videos as their training data. That's quite different from what is coming to your mind as "watching a video" as if the AI watched a single YouTube tutorial video once and got the concept.

      • reportgunner 11 hours ago

        I feel like you are jumping to conclusions here, I wasn't talking about the achievement or the AI, I was talking about the article and the way it explains finding diamonds in minecraft to people who don't know how to find diamonds in minecraft.

    • rowanG077 13 hours ago

      The AI is able to learn from video and you don't find that even a little bit impressive? Well I disagree.

  • CodeCompost 15 hours ago

    I didn't know that Nature did movie promotions.

  • ljdtt 9 hours ago

    Slightly off-topic from the article itself, but… does anyone else feel like Nature’s cookie banner just never goes away? I have vivid memories of trying to reject cookies multiple times, eventually giving up and accepting them just to get to the article only for the banner to show up again the next time I visit. I swear it’s giving me déjà vu every single visit.. Am I the only one experiencing this, or is this just how their site works?

  • N-Krause 14 hours ago
  • fine_tune 10 hours ago

    Attempting to train this on a real workload I converted over the weekend after, "step" 8M~ so far and rarely scores above 5% and most are 0% but has scored 60% once 7M~ steps ago.

    Adding more than 1 GPU didn't improve speed but that's pretty standard as we don't have fancy interconnect. Bit annoying they didn't use tensorboard for logging, but overall seems like a pretty cool lib - will leave it a few days and see if it can learn (no other algo has so I dont have much hope).

  • Xelynega 13 hours ago

    Isn't this DeepMind achievement from 2023?

  • protocolture 14 hours ago

    Finally a use case for AI

  • sbuttgereit 10 hours ago

    There's a YouTube channel that does a lot of videos focused on LLMs in Minecraft:

    https://www.youtube.com/@EmergentGarden

    I very much like the comparative approach this guy takes looking at how different LLMs fare... including how they interact together. Worth a look.

  • theOGognf 15 hours ago

    This looks like an article about the recent Nature publication. Was confused at first because DreamerV3 is a couple of years old now

  • lupusreal 11 hours ago

    Characterizing finding diamonds as "mastering" Minecraft is extremely silly. Tantamount to saying "AI masters Chess: Captures a pawn." Getting diamonds is not even close to the hardest challenge in the game, but most readers of Nature probably don't have much experience playing Minecraft so the title is actually misleading, not harmless exaggeration.

    • zimpenfish 11 hours ago

      > Getting diamonds is not even close to the hardest challenge in the game

      Mining diamonds isn't even necessary if you build, e.g., ianxofour's iron farm on day one and trade that iron[0] with a toolsmith, armourer, and weaponsmith. You can get full diamond armour, tools, and weapons pretty quickly (probably a handful of game weeks?)

      [0] Main faff here is getting them off their base trade level.

      • lupusreal 8 hours ago

        True, and if the objective is to get some raw diamonds as fast as possible demonstrating mastery of the game, I'd expect a strategy like making a boat, finding a shipwreck and then a buried treasure chest. Takes just a few minutes usually.

        Really though, if AI wants to impress me it needs to collect an assortment of materials and build a decent looking base. Play the way humans usually play.

  • successful23 14 hours ago

    Pretty impressive. Minecraft’s a complex environment, so for an AI to figure out how to find diamonds on its own shows real progress in learning through exploration — not just pattern recognition.

    • charcircuit 5 hours ago

      You can literally find diamonds by moving the mouse down and holding left click.

      • mjamesaustin 4 hours ago

        No you can't. Blocks broken without the correct tools do not yield items. In order to find diamonds the agent must first craft an iron pickaxe, which itself requires several other steps.

  • camel-cdr 10 hours ago

    How robust is this?

    Isn't something like finding dimonds in minecraft something that old-school AI could already do decently?

    • breakyerself 10 hours ago

      Those were trained on human play. This had to figure it out from scratch.

      • camel-cdr 10 hours ago

        Ah, is this full RL?

        I was reading something about LLMs earlier and was thinking that LLMs could probably write a simple case based script for controlling a player, that could accive a decent success rate.

  • nottorp 11 hours ago

    Isn't "masters" when you build a working copy of Minas Tirith or something like that?

    • Ntrails 8 hours ago

      I'd accept "build a tnt trap for your buddy" or "defeated the end dragon"

  • colechristensen 14 hours ago

    Who would have thought you could get your TAS run published in Nature if you used enough hot buzzwords. (they have been using various old-school-definition "artifical intelligence" algorithms for a long time)

    https://tasvideos.org/

  • ninetyninenine 6 hours ago

    It’s still being a stochastic parrot. Now it’s just parroting the human creativity and imagination so I’m still not impressed.

    If all you’re going to do is parrot things like human consciousness or human ingenuity then I will never be impressed so long that it’s just parroting.

    • smokel 6 hours ago

      You must be confused. This research is about reinforcement learning, not about large language models.

  • FrustratedMonky 8 hours ago

    Minecraft is ubiquitous now.

    But I remember the alpha version, and NOBODY knew how to make a pick ax. Humans were also very bad at figuring out these steps.

    People were de-compiling the java and posting help guides on the internet.

    How to break a tree, get sticks, make a wood pick. In Alpha, that was a big deal for humans also.

  • _vere 12 hours ago

    So can i and no one needed to teach me either, but you dont see nature writing articles on it...

    • Aachen 8 hours ago

      You don't want anyone to be working on replicating things humans can already do? We'll just continue tilling fields to eternity..

    • johnisgood 9 hours ago

      This is too dismissive, and there are a zillion articles of human learning.

  • jonathanyc 14 hours ago

    They write: "Below, we show uncut videos of runs during which Dreamer collected diamonds."

    ... but the first video only shows the player character digging downwards without using any tools and eventually dying in lava. What?

    • kbelder a minute ago

      Proving that the AI can play Minecraft as well as my wife?

  • fxtentacle 13 hours ago

    I guess we can look forward

    to a bright future

    where we focus 100% on work

    and AI will play our games

    /s

    • weatherlite 13 hours ago

      > where we focus 100% on work

      Lol that's crazy optimistic, what work ?

      • fxtentacle 11 hours ago

        Picking up dropped pencils, for example. Robots are still hilariously bad at that. Or driving your new AI overload around the country from LAN to LAN.

        • weatherlite 8 hours ago

          > Picking up dropped pencils, for example. Robots are still hilariously bad at that

          It's only hilarious because we're allowed to laugh. For now. Wait a few years its possible these things will demand respect.

          • recursive an hour ago

            I can't believe you'd laugh at a sentient agent who's trying their best to pick up a pencil and obviously struggling. Maybe you should practice some empathy and help them pick it up?

    • TeMPOraL 13 hours ago

      Once again, we see that it's much easier to teach machines to perceive and decide well, in many cases well above human performance - while at the same time, making machines that can navigate the same physical environment humans do, and do a variety of manual tasks that mix power and precision, remains extremely challenging.

      The message this sends is pretty clear: machines are better at thinking, humans are better at manual work. That is the natural division of labor that plays into strengths and weaknesses of both computers and human beings.

      And so, I'm sorry to say this, but the near future is that in which computers play our games and do the thinking and creative work and management (and ultimately governance), because they're going to be better at this than us, leaving us to do all the physical labor, because that's one thing we will remain better at for a while.

      That, or we move past the existing economic structures, so that we no longer need to worry about being competitive with AI labor.

      /s, but only a little.