17 points | by Topfi 7 days ago ago
1 comments
Have you tried to equip those agents with an access to grounded vision model to analyse that image?
In my experience most models can’t understand such imput properly
I am now experimenting with Molmo2 and it looks promising
Have you tried to equip those agents with an access to grounded vision model to analyse that image?
In my experience most models can’t understand such imput properly
I am now experimenting with Molmo2 and it looks promising