I have a little bit of a vice of enjoying some "idle" games. I have intended to do some very basic manual screen carving & ocr & computer vision to try to "read" my state in these games, & have multi-actor "play" models for them, just for fun really & to decrease time sunk gaming (by spending significant time coding/learning).
This certainly seems like it has a lot of promise to make that much much much easier. Game UI's are less uniform so maybe this might be harder or not easily be applicable, but hopefully
Depends what you consider fun, and how far you take it. Some people enjoy programming more than repetitive clicking in a GUI. For a clicker game, writing a bot lets you iterate on strategies easier - is it faster to get to level 2 if I buy the upgrade for A or B first? For Trackmania, it lets you get a world record and a YouTube video with 14M views.
To a considerable extent, we are stuck in the world we live in; but I am reminded of a quote by Guillaume Allais:
> My entire job seems to be repeating variations of "never start by forgetting the user's stated intent only to then attempt to guess it".
This is awesome, can't wait for evals against Claude Computer Use!
Can we first test this with basic sysadmin work in a simple shell?
Can't wait to replace "apt get install" by "gpt get install" and then have it solve all the dependency errors by itself.
Has anyone gotten this to work?
Copying the repo and downloading the models through HuggingFace or manually does not seem to work, you get errors indicating missing files.
Can it detect ads and mask them out?
Does it also tell the coordinates (x,y) of the annotated box w.r.t. the screenshot dimensions?
I have a little bit of a vice of enjoying some "idle" games. I have intended to do some very basic manual screen carving & ocr & computer vision to try to "read" my state in these games, & have multi-actor "play" models for them, just for fun really & to decrease time sunk gaming (by spending significant time coding/learning).
This certainly seems like it has a lot of promise to make that much much much easier. Game UI's are less uniform so maybe this might be harder or not easily be applicable, but hopefully
As someone who has done this to many games over a few decades, I can definitively say: 100% of the time, it ruins the fun of the game.
I can't say exactly why. Maybe you feel like you haven't earned it. Maybe it's the idle nature of farming that we really enjoy...
Depends what you consider fun, and how far you take it. Some people enjoy programming more than repetitive clicking in a GUI. For a clicker game, writing a bot lets you iterate on strategies easier - is it faster to get to level 2 if I buy the upgrade for A or B first? For Trackmania, it lets you get a world record and a YouTube video with 14M views.
https://youtu.be/Dw3BZ6O_8LY