Interaction Models

(thinkingmachines.ai)

45 points | by smhx 2 hours ago ago

6 comments

  • alyxya 14 minutes ago

    The noteworthy things to me are that the architecture is a transformer that takes in text, image, and audio input and produces text and audio output, all trained together, and it works in near real-time through interleaving inputs and outputs rather than pure generation of the output from a given prompt.

    > Time-Aligned Micro-Turns. The interaction model works with micro-turns continuously interleaving the processing of 200ms worth of input and generation of 200ms worth of output. Rather than consuming a complete user-turn and generating a complete response, both input and output tokens are treated as streams. Working with 200ms chunks of these streams enables near real-time concurrency of multiple input and output modalities.

    That's probably the main thing that distinguishes it from the multimodal models from other frontier labs as far as I can tell.

  • tedsanders 16 minutes ago

    Very cool! The demos felt fairly contrived - e.g., count things while I talk. I wonder what more commercial applications look like.

    • alyxya 4 minutes ago

      In theory I would expect it to do everything the current frontier models are capable of but with the added benefit of real time interactivity for better collaboration. The biggest benefit may be the real time video input so it can take in that input in parallel with producing outputs steered by the input rather than taking in a video or all images at once and then producing a single output for all of that.

  • rohitpaulk an hour ago

    Aside from how impressive the model is, the demos here are very well done! Quirky and short, unlike what we're used to from Anthropic and OpenAI.

  • suriya-ganesh 42 minutes ago

    incredibly impressive demos. I wonder how the training data for these models look like?

    is it separate batches of special "skills" that are added post training? how can they guarantee the models won't eventually lose a skill?

  • emsign 40 minutes ago

    That's neat and definitely the next step. But to be honest, I don't want an AI talk to me like that.