A Revolution in How Robots Learn

(newyorker.com)

23 points | by jsomers 9 hours ago ago

4 comments

m_ke 41 minutes ago

I did a review of state of the art in robotics recently in prep for some job interviews and the stack is the same as all other ML problems these days, take a large pretrained multi modal model and do supervised fine tuning of it on your domain data.

In this case it's "VLA" as in Vision Language Action models, where a multimodal decoder predicts action tokens and "behavior cloning" is a fancy made up term for supervised learning, because all of the RL people can't get themselves to admit that supervised learning works way better than reinforcement learning in the real world.

Proper imitation learning where a robot learns from 3rd person view of humans doing stuff does not work yet, but some people in the field like to pretend that teleoperation and "behavior cloning" is a form of imitation learning.

Animats 2 hours ago

A research result reported before, but, as usual, the New Yorker has better writers.

Is there something which shows what the tokens they use look like?

codr7 an hour ago

Oh my, that has to be one of the worst jobs ever invented.

x11antiek 8 hours ago

https://archive.is/fsuxe