19 comments

  • msoad a year ago

    I work in this space. We manage thousands of e2e tests. The pain has never been in writing the tests. Frameworks like Playwright are great at the UX. And having code editors like Cursor makes it even easier to write the tests. Now, if I could show Cursor the browser, it would be even better, but that doesn’t work today since most multimodal models are too slow to understand screenshots.

    It used to be that the frontend was very fragile. XVFB, Selenium, ChromeDriver, etc., used to be the cause of pain, but recently the frontend frameworks and browser automation have been solid. Headless Chrome hardly lets us down.

    The biggest pain in e2e testing is that tests fail for reasons that are hard to understand and debug. This is a very, very difficult thing to automate and requires AGI-level intelligence to really build a system that can go read the logs of some random service deep in our service mesh to understand why an e2e test fails. When an e2e test flakes, in a lot of cases we ignore it. I have been in other orgs where this is the case too. I wish there was a system that would follow up and generate a report that says, “This e2e test failed because service XYZ had a null pointer exception in this line,” but that doesn’t exist today. In most of the companies I’ve been at, we had complex enough infra that the error message never makes it to the frontend so we can see it in the logs. OpenTelemetry and other tools are promising, but again, I’ve never seen good enough infra that puts that all together.

    Writing tests is not a pain point worth buying a solution for, in my case.

    My 2c. Hopefully it’s helpful and not too cynical.

  • batikha a year ago

    Very cool! I already can see a lot of "this is already solved by playwright/cypress/selenium/deterministic stuff" in the comments.

    Over nearly 10 years in startups (big and small), I've been consistently surprised by how much I hear that "testing has been solved", yet I see very little automation in place and PMs/QAs/devs and sometimes CEOs and VPs doing lots of manual QA. And not only on new features (which is a good thing), also on happy path / core features (arguably a waste of time to test things over and over again).

    More than once I worked for a company that was against having a manual QA team, out of principle and more or less valid reasons (we use a typed language so less bug, engineers are empowered, etc etc), but ended up hiring external consultants to handle QA after a big quality incident.

    The amount of mismatch between theory and practice in this field is impressive.

  • ec109685 a year ago

    > In terms of trying the product out: since the service is resource-intensive (we provide hosted virtual/real phone instances), we don't currently have a playground available. However, you can see some examples here https://mobileboost.io/showcases and book a demo of GPT Driver testing your app through our website.

    Have you considered an approach like what Anthropic is doing for their computer control where an agent runs on your own computer and controls a device simulator?

  • codepathfinder a year ago

    I've been a mobile developer for the past 10 years and my overall belief is that mobile app development has slower growth and companies with the mobile team are investing less on mobile Dev or testing+tooling+education. Do you think the market is still hot once it was to use your product?

  • codepathfinder a year ago

    Is it possible to record the user screen and just generate a test case. I believe that's most efficient way IMO

  • rvz a year ago

    How does this compare to Robin by mobile.dev; the same guys that built Maestro? [0]

    That has around 95% of what GPT Driver does and has the potential to do Web E2E testing.

    [0] https://maestro.mobile.dev

  • mmaunder a year ago

    Congrats! How has Anthropic's latest release supporting computer use affected your planning/thinking around this?

    PS:If you had this for desktop we'd immediately become a customer.

  • drothlis a year ago

    I noticed in your demo it generated the prompt "tap on the 'Log in' button located directly below the 'Facebook Password' field".

    Does your model consistently get the positions right? (above, below, etc). Every time I play with ChatGPT, even GPT-4o, it can't do basic spatial reasoning. For example, here's a typical output (emphasis mine):

    > If YouTube is to the upper *left* of ESPN, press "Up" once, then *"Right"* to move the focus.

    (I test TV apps where the input is a remote control, rather than tapping directly on the UI elements.)

  • xyst a year ago

    I remember testing out a similar product (mabl?). Ended up just using it to check for dead links. Using it for other use cases, I remember getting too many false positives for other use cases.

    This was many years ago though (2018-2019?) before the genAI craze. Wonder if it has improved or not; or if this product is any better than its competitors.

  • pj_mukh a year ago

    This is super cool. As a question, are the instructions re-generated from the instruction tokens everytime. While maybe costly, this feels like it would be robust to small changes in the app (and component name changes etc.). Does that make sense?

  • tomatohs a year ago

    Curious what happened to the other YC Mobile AI E2E company, CamelQA (YC W24). They pivoted to AI assistants. Could be good lessons there if you're not already in touch with them.

  • bluelightning2k a year ago

    Genuinely curious, is the timing on this immediately after Claude computer use a coincidence? Or was that like the last missing piece, or a kind of threat which expedited things

  • doublerebel a year ago

    How does this compare with Test.ai (now aka Testers.ai) who have offered basically this same service for the last 5 years?

  • archerx a year ago

    Curious question, what ever happened with the OpenAI drama with trademarking “GPT”. I’m guessing they were not successful?

  • alexwordxxx a year ago
  • 101008 a year ago

    Still interesting how a lot of companies offer a LLM (non-deterministic) solution for deterministic problems.

  • aksophist a year ago

    how do you evaluate your tool, and have you published your evaluation along with the metrics?

  • iknownthing a year ago

    no logo?

  • lihua919 a year ago

    interesting