PyTorch Monarch

(pytorch.org)

256 points | by jarbus 8 hours ago ago

36 comments

  • chandureddyvari 6 hours ago

    Interesting - this seems to target a different layer than services like Tinker (https://thinkingmachines.ai/blog/announcing-tinker/). Monarch provides the infrastructure primitives while Tinker is a managed finetuning service. Could someone build something like Tinker on top of Monarch?

    • gaogao 5 hours ago

      Yup, there's stuff like https://pytorch.org/blog/introducing-torchforge/ on top of it now

      • chandureddyvari 5 hours ago

        Nice, so the open source equivalent now exists. Meta basically commoditized Tinker's($12B valuation) value prop by giving away the infra (Monarch) and the RL framework (TorchForge). Will be interesting to see how a managed service competes with free + open source at this layer.

  • pjmlp 7 hours ago

    Apparently PyTorch oxidation has started.

    > Monarch is split into a Python-based frontend, and a backend implemented in Rust.

    Other than that, looks like a quite interesting project.

    • dhrt12327 5 hours ago

      Multiple sources say that it is an experimental framework around PyTorch, not a replacement. People will still get to enjoy a circular graph using std::shared_ptr with memory leaks.

      It's a pity they don't do a complete rewrite with a functional language as the driver.

      • gaogao 5 hours ago

        > It's a pity they don't do a complete rewrite with a functional language as the driver.

        It's open source, so seeing such an extension would be quite cool. There's much that could be done with native Rust actors and code that get maybe at what you want, but nothing precludes mixing PyTorch and other backends.

        For example, you could wrap a C++ inference engine as part of one of the actors generating data for other actors doing distributed training.

      • pjmlp 5 hours ago

        Interesting, by the way, you can replicate the experience in Rust.

      • hansvm 3 hours ago

        Arc<T> has entered the chat.

    • galangalalgol 6 hours ago

      This is a new project right? Not the oxidation of an existing one.

      • gaogao 5 hours ago

        Yup, hyperreactor, one of the new crates that's part of it, does some particularly interesting things for efficient parallel distributed channels.

  • alyxya 6 hours ago

    I made my own single controller PyTorch extension [1], though mines doesn't yet support cross node communication. I found it interesting to compare how Monarch makes things performant. I believe Monarch also uses cloudpickle for code to be shared among all nodes, which is probably the only way to performantly have various nodes execute work as that ends up being a one time setup cost. I found the fanning out of sending messages from the single controller to be really interesting, so the controller is unlikely to be the bottleneck besides any synchronous operations.

    As far as things that might be a performance loss here, one thing I'm wondering is if custom kernels are supported. I'm also wondering how much granularity of control there is with communication between different actors calling a function. Overall, I really like this project and hope to see it used over multi-controller setups.

    [1] https://github.com/alyxya/mycelya-torch

    • gaogao 5 hours ago

      > As far as things that might be a performance loss here, one thing I'm wondering is if custom kernels are supported

      Yeah, you might end up needing some changes to remote worker initialization, but you can generally bake in whatever kernels and other system code you need.

  • valzam 7 hours ago

    I assume this is similar to Ray?

  • semessier an hour ago

    this could become a major thing in coarray world, but the issues start already:

    > ...Note that this does not support tensor engine, which is tied to CUDA and RDMA (via ibverbs).

    I.e. yet another CUDA married approach: the issue is not ibverbs but the code shows they use GPUDirect RDMA, going from there this can only get worse - more CUDA dependencies. There would have been OpenUCX.

  • fadedsignal 4 hours ago

    It is a nice project. I have questions.

    - Is this similar to openMPI?

    - How is a mesh established? Do they need to be on the same host?

  • porridgeraisin 6 hours ago

    > This lets us avoid single-host bottlenecks, effectively using the whole mesh as a distributed cluster for message forwarding. (Cite scalability numbers here.)

    In case someone that can fix this is reading here

  • milancurcic 6 hours ago

    Cool! Essentially Fortran coarrays from 2008.

    • philipallstar 6 hours ago

      Or Hadoop from 2006? But you don't need to write MapReduce or Fortran, so it's probably far nicer.

  • jonapro 7 hours ago

    Beowulf then.

  • logicchains 6 hours ago

    This seems strictly less powerful than Jax, which comes with a powerful compiler that optimises how cross-node communication is conducted.

    • gaogao 5 hours ago

      Nah, focusing on a different controller paradigm. Jax is focused on multi-controller SPMD, while this is focused on a single-controller setup. Both have their place, with single-controller being generally easier to reason about, and multi-controller more optimal for certain dataflows. There's also some interesting mixes of the two control paradigms.

  • nothrowaways 6 hours ago

    FB should create a pytorch foundation and set it free before they fuck it up.

  • SomaticPirate 5 hours ago

    "Our Rust-based backend facilitates our performance, scale, and robustness — we amply use Rust’s fearless concurrency in Monarch’s implementation"

    Found a few typo's. The em dash makes me suspect an LLM was involved in proofreading

    • alt187 4 hours ago
      • geedzmo 4 hours ago

        That was a really good read. Glad I clicked

        • alt187 3 hours ago

          It's not even one of the funniest pieces of the author, and that says a lot.

    • hellohello2 3 hours ago

      I would argue that typos suggest an LLM did not proofread.

    • whimsicalism 4 hours ago

      that it is surrounded by spaces makes this less likely

      • ComputerGuru 41 minutes ago

        Most style guides would call that an error, em dash should be used without surrounding spaces (while an en dash requires them). The only publication I know that has (recently?) eschewed that advice is WaPo. If the idea was to make it more visible, I believe the correct solution would have been for WaPo to use an en dash but render it longer in their typeface.