21 comments

  • luke-stanley 2 days ago

    I'm glad this worked for Simon, but I would probably prefer using a User Script that scrapes DOM text changes and streams them to a small local web server to append to a JSONL file that has the URL, text change and timestamp. Probably since I already have something doing this, it allows me to do a backup of things I'm looking at in real time, like streaming LLM generations, and it just relies on normal browser technology. I should probably share my code since it's quite useful. I'm a bit uncomfortable relying on a LLM to transcribe something where there is a stream of text that could be used in a robust way, and with real data, Vs well trained but indirect token magic. A middle ground might be to have grounded extraction and evidence chains, with timestamps, screenshots, cropped regions it's sourcing from, spelled out reasoning. There's the extraction / retrieval step and there's a kind of data normalisation. Of course, it's nice that he's got something that just works with two or three steps, it's good the technology is getting quite reliable and cheap a lot of the time, but still, we could do better.

  • anigbrowl 3 hours ago

    Being Google, isn't it highly likely that the price is a loss leader which will later be changed once customers are sufficiently locked in? I get that this is more convenient than doing it programatically or manually, but that seems like a reason to using something other than gmail. This approach just seems incredibly wasteful to me.

  • euroderf a day ago

    Couldn't he have sent it as a fax and then photographed the fax ?

  • pridkett 2 days ago

    Video scraping doesn’t need to be just screen captures. I’ve demoed a solution with Gemini where you take a video walking up and down aisles in a retail store and it captured 100% accurate data on product name, quantity/size, sku, and price for a little under 75% of the products. And that was back in January.

    This has huge implications for everything from competitive pricing, to understanding store layouts, to creating your own grocery store inflation monitor. Just subtly take a video and process it.

    And the models have only gotten better.

    • tgv 3 hours ago

      > This has huge implications for everything from competitive pricing, to understanding store layouts

      Even smaller stores have been monitoring their competitors since a long time.

      > your own grocery store inflation monitor

      You could also check your itemized bill.

  • Havoc a day ago

    Still amazed that video is so "cheap" on tokens despite being way more bytes than text

    • odo1242 2 hours ago

      Pretty sure there's some strong preprocessing bring applied to that video though. Maybe even to the point of extracting text and deduplicating it between frames.

  • etewiah 2 days ago

    You've got me thinking. Would this work for real estate data? A lot of sites make it quite hard to grab their raw data. Also, perhaps it could gain some insights from the photos...

    • simonw 2 days ago

      I'm certain it would. That would be a really fun experiment to run!

    • jerpint 2 days ago

      Could also work for social media which can be hard to scrape

  • danjc 2 days ago

    I think this sort of thing is what Microsoft intended with Recall. The problem is the privacy implications are horrible.

    • simonw 2 days ago

      Something I really like about this technique is that I stay in complete control of what I expose to the model. If I don't want something fed into the model I omit it from the screen recording.

  • teruakohatu 2 days ago

    I admit this is a pretty cool technique, but what is missing is how accurate the data extraction was. Without knowing that it is not possible to judge how useful this technique is.

    • simonw 2 days ago

      I watched the 35 second long video and confirmed by eyeballing the JSON that the result was exactly correct.

    • kaveet 2 days ago

      > You should never trust these things not to make mistakes, so I re-watched the 35 second video and manually checked the numbers. It got everything right.

    • korkybuchek 2 days ago

      He said in his tweet that he verified the results.

  • m-hodges 2 days ago

    This is gonna push things towards some very unfortunate DRM.

    • toomuchtodo a day ago

      Can't stop the webcam->LLM

      • okwhateverdude a day ago

        Poison pixels and other cat-and-mouse things will definitely happen.