2 comments

  • onesandofgrain 39 minutes ago

    Can someone smarter than me explain what this is about?

    • Kalabint 23 minutes ago

      > Can someone smarter than me explain what this is about?

      I think you can find the answer under point 3:

      > In this work, our primary goal is to show that pretrained text-to-image diffusion models can be repurposed as object trackers without task-specific finetuning.

      Meaning that you can track Objects in Videos without using specialised ML Models for Video Object Tracking.