Schedule tasks on the web

(code.claude.com)

138 points | by iBelieve 5 hours ago ago

97 comments

  • nickandbro 5 hours ago

    I feel like we are just inching closer and closer to a world where rapid iteration of software will be by default. Like for example a trusted user makes feedback -> feedback gets curated into a ticket by an AI agent, then turned into a PR by an Agent, then reviewed by an Agent, before being deployed by an Agent. We are maybe one or two steps from the flywheel being completed. Or maybe we are already there.

    • chatmasta 4 hours ago

      I love everything about this direction except for the insane inference costs. I don’t mind the training costs, since models are commoditized as soon as they’re released. Although I do worry that if inference costs drop, the companies training the models will have no incentive to publish their weights because inference revenue is where they recuperate the training cost.

      Either way… we badly need more innovation in inference price per performance, on both the software and hardware side. It would be great if software innovation unlocked inference on commodity hardware. That’s unlikely to happen, but today’s bleeding edge hardware is tomorrow’s commodity hardware so maybe it will happen in some sense.

      If Taalas can pull off burning models into hardware with a two month lead time, that will be huge progress, but still wasteful because then we’ve just shifted the problem to a hardware bottleneck. I expect we’ll see something akin to gameboy cartridges that are cheap to produce and can plug into base models to augment specialization.

      But I also wonder if anyone is pursuing some more insanely radical ideas, like reverting back to analog computing and leveraging voltage differentials in clever ways. It’s too big brain for me, but intuitively it feels like wasting entropy to reduce a voltage spike to 0 or 1.

      • throwaw12 an hour ago

        > I love everything about this direction except for the insane inference costs.

        If this direction holds true, ROI cost is cheaper.

        Instead of employing 4 people (Customer Support, PM, Eng, Marketing), you will have 3-5 agents and the whole ticket flow might cost you ~20$

        But I hope we won't go this far, because when things fail every customer will be impacted, because there will be no one who understands the system to fix it

      • eksu 4 hours ago

        This is the wrong way to see it. If a technology gets cheaper, people will use more and more and more of it. If inference costs drop, you can throw way more reasoning tokens and a combination of many many agents to increase accuracy or creativity and such.

        • gf000 2 minutes ago

          > throw way more reasoning tokens and a combination of many many agents to increase accuracy or creativity and such.

          But this is just not true, otherwise companies that can already afford such high prices would have already outpaced their competitors.

      • mastermage 3 hours ago

        I mean theoretically if there are many competitiors the costs of the product should generally drop because competition.

        Sadly enough I have not seen this happening in a long time.

    • Leptonmaniac 4 hours ago

      I think that as a user I'm so far removed from the actual (human) creation of software that if I think about it, I don't really care either way. Take for example this article on Hacker News: I am reading it in a custom app someone programmed, which pulls articles hosted on Hacker News which themselves are on some server somewhere and everything gets transported across wires according to a specification. For me, this isn't some impressionist painting or heartbreaking poem - the entity that created those things is so far removed from me that it might be artificial already. And that's coming from a kid of the 90s with some knowledge in cyber security, so potentially I could look up the documentation and maybe even the source code for the things I mentioned; if I were interested.

      • slopinthebag 3 hours ago

        Art is and has always been about the creator.

        • vntok an hour ago

          Take a walk in any museum, I'm pretty sure you'll react to some of the art displayed there and find it cool before you read the name of the artist.

    • theredbeard 3 hours ago

      We haven’t been inching closer to users writing a half-decent ticket in decades though.

      • aembleton 2 hours ago

        Maybe the agent can ask the user clarifying questions. Even better if it could do it at the point of submission.

    • andy_ppp 11 minutes ago

      Users are often incorrect about what the software should actually be doing and don’t see the bigger picture.

    • heavyset_go 3 hours ago

      Feedback loops like that would be an exercise in raising garbage-in->garbage-out to exponential terms.

      It's the "robots will just build/repair themselves" trope but the robots are agents

      • TeMPOraL 2 hours ago

        Yes. Next they'll want nanobots that build/repair themselves.

        Oh wait. That's already here and is working fine.

    • jvuygbbkuurx 5 hours ago

      Tusted user like Jia Tan.

    • mindwok 2 hours ago

      I think Anthropic will launch backend hosting off the back of their Bun acquisition very soon. It makes sense to basically run your entire business out of Claude, and share bespoke apps built by Claude code for whatever your software needs are.

    • tuo-lei 3 hours ago

      The missing piece for me is post-hoc review.

      A PR tells me what changed, but not how an AI coding session got there: which prompts changed direction, which files churned repeatedly, where context started bloating, what tools were used, and where the human intervened.

      I ended up building a local replay/inspection tool for Claude Code / Cursor sessions mostly because I wanted something more reviewable than screenshots or raw logs.

    • shafyy 16 minutes ago

      Haha sure, let's just let every user add their feedback to the software.

    • edf13 3 hours ago

      Or perhaps we end up where all software is self evolving via agents… adjusting dynamically to meet the users needs.

      • PeterStuer 2 hours ago

        The "user" being the one that's in charge of the AI, not the person on the receiving end.

    • slopinthebag 4 hours ago

      What kind of software are people building where AI can just one shot tickets? Opus 4.6 and GPT 5.4 regularly fail when dealing with complicated issues for me.

      • withinboredom 4 hours ago

        Not just complicated, but even simple ones if the current software is too “new” of a pattern they’ve never seen before or trained on.

        • slopinthebag 4 hours ago

          I dunno if Rust async or native platform API's which have existed for years count as new patterns, but if you throw even a small wrench in the works they really struggle. But that's expected really when you look at what the technology is - it's kind of insane we've even gotten to this point with what amounts to fancy autocomplete.

      • victorbjorklund 3 hours ago

        Of course not all tickets are complex. Last week I had to fix a ticket which was to display the update date on a blog post next to the publish date. Perfect use case for AI to one shot.

      • thin_carapace 4 hours ago

        i dont see anyone sane trusting ai to this degree any time soon, outside of web dev. the chances of this strategy failing are still well above acceptable margins for most software, and in safety critical instances it will be decades before standards allow for such adoption. anyway we are paying pennies on the dollar for compute at the moment - as soon as the gravy train stops rolling, all this intelligence will be out of access for most humans. unless some more efficient generalizable architecture is identified.

        • heavyset_go 3 hours ago

          > as soon as the gravy train stops rolling, all this intelligence will be out of access for most humans. unless some more efficient generalizable architecture is identified.

          All Chinese labs have to do to tank the US economy is to release open-weight models that can run on relatively cheap hardware before AI companies see returns.

          Maybe that's why AI companies are looking to IPO so soon, gotta cash out and leave retail investors and retirement funds holding the bag.

          • PeterStuer 3 hours ago

            They could still eliminate relatively cheap hardware.

          • thin_carapace 3 hours ago

            i was under the impression that we were approaching performance bottlenecks both with consumer GPU architecture and with this application of transformer architecture. if my impression is incorrect, then i agree it is feasible for china to tank the US economy that way (unless something else does it first)

            • heavyset_go 2 hours ago

              I think it just needs to be efficient or small enough for companies to deploy their own models on their hardware or cloud, for more inference providers to come out of the woodwork and compete on price, and/or for optimized models to run locally for users.

              Regarding the latter, smaller models are really good for what they are (free) now, they'll run on a laptop's iGPU with LPDDR5/DDR5, and NPUs are getting there.

              Even models that can fit in unified 64GB+ memory between CPU & iGPU aren't bad. Offloading to a real GPU is faster, but with the iGPU route you can buy cheaper SODIMM memory in larger quantities, still use it as unified memory, eventually use it with NPUs, all without using too much power or buying cards with expensive GDDR.

              Qwen-3.5 locally is "good enough" for more than I expected, if that trend continues, I can see small deployable models eventually being viable & worthy competition, or at least being good enough that companies can run their own instead of exfiltrating their trade secrets to the worst people on the planet in real-time.

        • m00x 4 hours ago

          Several fintechs like Block and Stripe are boasting thousands of AI-generated PRs with little to no human reviews.

          Of course it's in the areas where it doesn't matter as much, like experiments, internal tooling, etc, but the CTOs will get greedy.

          • slopinthebag 4 hours ago

            I don't think anybody is doubting its ability to generate thousands of PR's though. And yes, it's usually in the stuff that should have been automated already regardless of AI or not.

          • thin_carapace 3 hours ago

            these companies contribute to swathes of the west's financial infrastructure, not quite safety critical but critical enough, insane to involve automation here to this degree

        • slopinthebag 4 hours ago

          Even in webdev it rots your codebase unchecked. Although it's incredibly useful for generating UI components, which makes me a very happy webslopper indeed.

          • thin_carapace 3 hours ago

            im grateful to have never bothered learning web dev properly, it was enlightening witnessing chat gpt transform my ten second ms paint job into a functional user interface

    • eru 3 hours ago

      Instead of having a trusted user, you can also do statistics on many users.

      (That's basically what A/B testing is about.)

    • hyperionultra 3 hours ago

      "Trusted user" also can be an Agent.

    • bredren 4 hours ago

      What you're describing is absolutely where we're headed.

      But the entire SWE apparatus can be handled.

      Automated A/B testing of the feature. Progressive exposure deployment of changes, you name it.

    • tossandthrow 4 hours ago

      I think the Ai agent will directly make a PR - tickets are for humans with limited mental capacity.

      At least in my company we are close to that flywheel.

      • _puk 4 hours ago

        Tickets need to exist purely from a governance perspective.

        Tickets may well not look like they do now, but some semblance of them will exist. I'm sure someone is building that right now.

        No. It's not Jira.

        • tossandthrow 3 hours ago

          Yes, so my point is that PRs act as that governance layer - with preview environments, you can see the complexity and risk of the change etc.

      • Gigachad 4 hours ago

        The agents have even more limited capacity

        • eru 3 hours ago

          At the moment, maybe. But it's growing.

          • Gigachad 2 hours ago

            Even so they would probably still benefit from intermediate organisational steps.

            • eru 2 hours ago

              For a while, sure.

    • overfeed 2 hours ago

      > I feel like we are just inching closer and closer to a world where rapid iteration of software will be by default.

      There's a lots of experimentation right now, but one thing that's guaranteed is that the data gatekeepers will slam the door shut[1] - or install a toll-booth when there's less money sloshing about, and the winners and losers are clear. At some point in the future, Atlassian and Github may not grant Anthropic access to your tickets unless you're on the relevant tier with the appropriate "NIH AI" surcharge.

      1. AI does not suspend or supplant good old capitalism and the cult of profit maximization.

    • MattGaiser 4 hours ago

      I am already there with a project/startup with a friend. He writes up an issue in GitHub and there is a job that automatically triggers Claude to take a crack at it and throw up a PR. He can see the change in an ephemeral environment. He hasn't merged one yet, but it will get there one day for smaller items.

      I am already at the point where because it is just the two of us, the limiting factor is his own needs, not my ability to ship features.

      • jondwillis 4 hours ago

        Why doesn’t he merge them?

      • m00x 4 hours ago

        Must be nice working on simple stuff.

    • yieldcrv 4 hours ago

      We do feedback to ticket automatically

      We dont have product managers or technical ticket writers of any sort

      But us devs are still choosing how to tackle the ticket, we def don't have to as I’m solving the tickets with AI. I could automate my job away if I wanted, but I wouldn't trust the result as I give a degree of input and steering, and there’s bigger picture considerations its not good at juggling, for now

    • charcircuit 5 hours ago

      Then sets up telemetry and experiments with the change. Then if data looks good an agent ramps it up to more users or removes it.

    • eranation 4 hours ago

      Um, we are already there...

  • gowthamgts12 5 hours ago

    interesting to see feature launches are coming via official website while usage restrictions are coming in with a team member's twitter account - https://x.com/trq212/status/2037254607001559305.

    also, someone rightly predicted this rugpull coming in when they announced 2x usage - https://x.com/Pranit/status/2033043924294439147

    • stingraycharles 5 hours ago

      To me it makes perfect sense for them to encourage people to do this, rather than eg making things more expensive for everyone.

      The same as charging a different toll price on the road depending on the time of day.

    • tyre 2 hours ago

      If you read the replies to the second, you’ll see an engineer on Claude Code at Anthropic saying that it is false.

      Someone spread FUD on the internet, incorrectly, and now others are spreading it without verifying.

      • hobofan 32 minutes ago

        And if you look closely at the usernames, you see that the same engineer from link 2 that said "nah it’s just a bonus 2x, it’s not that deep" (just two week ago) is now saying "we're going to throttle you during peak hours" (as predicted).

        Yes, it was FUD, but ended up being correct. With the track record that Anthropic has (e.g. months long denial of dumbed down models last year, just to later confirm it as a "bug"), this just continues to erode trust, and such predictions are the result of that.

  • javiercr 2 hours ago

    I've recently switched from GitHub Copilot Pro to Claude Code Max (20x). While Claude is clearly superior in many aspects, one area where it falls short is remote/cloud agents.

    Yesterday, I spent the entire day trying to set up "Claude on the web" for an Elixir project and eventually had to give up. Their network firewall kept killing Hex/rebar3 dependency resolution, even after I selected "full" network access.

    The environment setup for "on the web" is just a bash script. And when something goes wrong, you only see the tail of the log. There is currently no way to view the full log for the setup script. It's really a pain to debug.

    The Copilot equivalent to "Claude on the web" is "GitHub Copilot Coding Agents," which leverages GitHub Actions infrastructure and conventions (YAML files with defined steps). Despite some of the known flaws of GitHub Actions, it felt significantly more robust.

    "Schedule task on the web" is based on the same infrastructure and conventions as "Claude on the web", so I'm afraid I'm gonna have the same troubles if I want to use this.

  • simianwords 3 hours ago

    I remember when I tried to set something up with the ChatGPT equivalent like "notify me only if there are traffic disruptions in my route every morning at 8am" and it would notify me every morning even if there was no disruption.

    • theredbeard 3 hours ago

      This is because for some reason all agentic systems think that slapping cron on it is enough, but that completely ignores decades of knowledge about prospective memory. Take a look at https://theredbeard.io/blog/the-missing-memory-type/ for a write-up on exactly that.

    • alexhans 25 minutes ago

      Why not set your own evals and something like pi-mono for that? https://github.com/badlogic/pi-mono/

      You'll define exactly what good looks like.

    • scottmcdot 3 hours ago

      Me too. It doesn't have ability to alert only on true positive. I has to also alert on true negative. So dumb

      • worldsayshi 2 hours ago

        This doesn't seem to hard to solve except for the ever so recurring llm output validation problem. If the true positive is rare you don't know if the earthquake alert system works until there's an earthquake.

  • monkeydust 3 hours ago

    I do feel people will end up using this for things where a deterministic rule could be used - more effective, faster and cheaper. See this starting to happen at work...'We need AI to solve X....no you don't"

    • TeMPOraL 2 hours ago

      Maybe. The problem of "execute task on a cron" is something I've noticed the industry seems to refuse to solve in general, as if intentionally denying this capability for regular people. Even without AI, it's the most basic block of automation, and is always mysteriously absent from programs and frameworks (at least at the basic level). AI only makes it more useful on "then" side, but reliable cron on "if" side is already useful.

      • 9wzYQbTYsAIc 34 minutes ago

        I don’t recall if IFTTT had/has a basic cron or not, but it sure has/had put a lot of basic automations in the hands of the general public. Same for Apple Shortcuts, to some extent, or Zapier.

        • TeMPOraL 6 minutes ago

          This is a larger topic that's worthy of a comparably large rant, which I really don't want to do right now, but to keep it short, in my subjective view:

          - IFTTT was great when it started; at some point, it became... weird, in a "I don't even know what's going on on my screen, is this a poster or an app" kind of way.

          - Zapier is an unpenetrable mess, evidently targets marketers and other business users; discovery is hard, and even though it seems like it has everything, it - like all tools in this space - is always missing the one feature you actually need.

          - Yahoo Pipes, I heard they were great, but I only learned about them after they shut down.

          - Apple Shortcuts - not sure what you can do with those, but over the years of reading about them in HN comments, I think they may be the exception here, in being both targeting regular users and actually useful.

          - Samsung Modes and Routines - only recently becoming remotely useful, so that's nice, even if vendor-restricted.

          - Tasker - an Android tool that actually manages to offer useful automation, despite the entire platform/OS and app ecosystem trying its best to prevent it. Which is great, if your main computer is a phone. It sucks in a world of cloud/SaaS, because it creates a silly situation where e.g. I could nicely automate some things involving e-mail and calendars from Tasker + FairEmail, but... well my mailboxes and calendars lives in the cloud so some of that would conflict with use of vendor (Fastmail) webapp or any other tool.

          Or, in short: we need Tasker but for web (and without some of the legacy baggage around UI and variable handling).

          The sorry state of automation is not entirely, or even mostly, the fault of the automation platforms. I may have issues with some UI and business choices some of these platforms made, but really, the main issue is that integrations are business deals and the integrated sides quickly learned to provide only a limited set of features - never enough to allow users to actually automate use of some product. There's always some features missing. You can read data but not write it. You can read files and create new files but not edit or delete them. You can add new tasks but can't get a list of existing ones. Etc.

          It's another reason LLMs are such a great thing to happen - they make it easy (for now) to force interoperability between parties that desperately want to prevent it. After all, worst case, I can have the LLM operate the vendor site through a browser, pretending to be a human. Not very reliable, but much better than nothing at all.

      • monkeydust 2 hours ago

        Agree. How would you solve this in general, what would be the ingredients? People use things like zapier, n8n, node-red to achieve this today but in many cases are overkill.

        • bshimmin an hour ago

          Honestly, you just need cron (and Ruby/Python/bash/whatever) on an EC2. It's not very fashionable, but it works, will continue to work forever, and costs hardly anything.

        • TeMPOraL 25 minutes ago

          I'd start with solving the UX issues, specifically expectations and UI around scheduling jobs.

          Expectations - the functionality of "do X on a timer" needs to be offered to users as a proper end-user feature[0], not treated as a sysadmin feature (Windows, Linux) or not provided at all (Android). People start seeing it on their own devices, they'll start using it, then expecting it, and the web will adjust too[1].

          UI - somehow this escapes every existing solution, from `cron` through Windows timers to any web "on timer" event trigger in any platform ever. There already exists a very powerful UI paradigm for managing recurring tasks, that most normies know how to use, because they're already using it daily at work and privately: a calendar. Yes, that thing where we can set and manage recurring events, and see them at a glance, in context of everything else that's going on in our lives.

          --

          <rant>

          I know those are hard problems, but are hard mostly because everybody wants to be the fucking one platform owning users and the universe. This self-inflicted sickness in computing is precisely why people will jump at AI solutions for this. Why I too will jump on this: because it's easier than dealing with all the systems and platforms that don't want to cooperate.

          After all, at this point, the easiest solution to the problems I listed above, and several others in this space, would be to get an AI agent that I can:

          1) Run on a cron every 30 minutes or so (events are too complicated);

          2) Give it read (at minimum) access to my calendar and todo lists (the ones I use, but I'm willing to compromise here);

          3) Give it access to other useful tools

          Which I guess brings us to the actual root problem here. "Run tasks on a cron" and "run tasks on trigger" are basically just another way of saying unattended/non-interactive usage. That is what is constantly being denied end users.

          This is also the key to enabling most value of AI tools, too, and people understand it very well (see the popularity of that Open Claw thing as the most recent example), but the industry also lives in denial, believing that "lethal trifecta" is a thing that can be solved.

          </rant>

          --

          [0] - This extends to event triggers ("if X happens, then") automation, and end-user automation in all of every-day life. I mean, it's beyond ridiculous that the only things normal people are allowed to run automatically are dishwasher, and a laundry machine (and in the previous era, VCRs).

          [1] - As a side effect, it would quickly debullshitify "smart home" / "internet of things" spaces a lot. The whole consumer side of the market revolves around selling people basic automation capabilities - except vendor-locked, and without the most useful parts.

    • alexhans 28 minutes ago

      I'd say that's almost fine if they can start expressing intent correctly and thinking what good looks like. They (or some automated thing if you're building "think for them" type of products instead of "give them tools and teach them to think how to use them") can then freeze determism more and more were useful

      I wrote this to help people (not just Devs) reason about agent skills

      https://alexhans.github.io/posts/series/evals/building-agent...

      And this one to address the drift of non determism (but depending on the audience it might not resonate as much)

      https://alexhans.github.io/posts/series/evals/error-compound...

    • dspillett an hour ago

      > See this starting to happen at work...'We need AI to solve X....no you don't"

      Same. Sometimes it is just people overeager to play with new toys, but in our case there is a push from the top & outside too: we are in the process of being subsumed into a larger company (completion due on April the 1st, unless the whole thing is an elaborate joke!) and there is apparently a push from the investors there to use "AI" more in order to not "get left behind the competition".

      • monkeydust an hour ago

        Its self perpetuating, I was talking to CEO of a Series A level B2B SaaS company here in UK recently. Most of the propspects his sales team are hitting are re-allocating their wallets to only looking for products that use AI on back of senior management pushing them to do so.

        This company already does some pretty cool stuff with statistics for forecasting but now they are pivoting their roadmap to bake in GenAI into their offering over some other features that would be more valuable to their clients.

    • logicprog 41 minutes ago

      The problem I'd think, for the average user, would be writing the 'then' part of any deterministic rule — that would require coding, or at least some kind of automation script (visual or otherwise) that's basically coding in a trench coat, which for most people is still a barrier to entry and annoying. I think that's why they'd use AI tbh — they can just describe what they want in natural language with AI.

    • globular-toast 3 minutes ago

      Standard pendulum swing. Most people want to disengage their thinking circuits most of the time, so problems can't be evaluated one by one. There is no such thing as "this is a good solution for some problems". It can only be "this is a good solution for all problems". When the pendulum swings this far, this hard, it will swing all the way back eventually.

    • elcapitan 25 minutes ago

      AI will become this colleague who sucks at everything, but never says no, so he becomes the favorite go-to person.

    • beefsack an hour ago

      I feel this would be more useful for tasks like "Check website X to see if there are any great deals today". Specifically, tasks that are loosely defined and require some form of intuition.

    • comboy 2 hours ago

      People are loading huge interpreted environments for stuff that can be done from the command line. Run computations on complex objects where it could be a single machine instruction etc. The trend has been around for a long time.

  • chopete3 4 hours ago

    Claude is moving fast.

    https://grok.com/tasks

    Grok has had this feature for some time now. I was wondering why others haven't done it yet.

    This feature increases user stickiness. They give 10 concurrent tasks free.

    I have had to extract specific news first thing in the morning across multiple sources.

  • iBelieve 5 hours ago

    Looks like I'm limited to only 3 cloud scheduled tasks. And I'm on the Max 20x plan, too :(

    "Your plan gets 3 daily cloud scheduled sessions. Disable or delete an existing schedule to continue."

    But otherwise, this looks really cool. I've tried using local scheduled tasks in both Claude Code Desktop and the Codex desktop app, and very quickly got annoyed with permissions prompts, so it'll be nice to be able to run scheduled tasks in the cloud sandbox.

    Here are the three tasks I'll be trying:

    Every Monday morning: Run `pnpm audit` and research any security issues to see if they might affect our project. Run `pnpm outdated` and research into any packages with minor or major upgrades available. Also research if packages have been abandoned or haven't been updated in a long time, and see if there are new alternatives that are recommended instead. Put together a brief report highlighting your findings and recommendations.

    Every weekday morning: Take at Sentry errors, logs, and metrics for the past few days. See if there's any new issues that have popped up, and investigate them. Take a look at logs and metrics, and see if anything seems out of the ordinary, and investigate as appropriate. Put together a report summarizing any findings.

    Every weekday morning: Please look at the commits on the `develop` branch from the previous day, look carefully at each commit, and see if there are any newly introduced bugs, sloppy code, missed functionality, poor security, missing documentation, etc. If a commit references GitHub issues, look up the issue, and review the issue to see if the commit correctly implements the ticket (fully or partially). Also do a sweep through the codebase, looking for low-hanging fruit that might be good tasks to recommend delegating to an AI agent: obvious bugs, poor or incorrect documentation, TODO comments, messy code, small improvements, etc.

    I ran all of these as one-off tasks just now, and they put together useful reports; it'll be nice getting these on a daily/weekly basis. Claude Code has a Sentry connector that works in their cloud/web environment. That's cool; it accurately identified an issue I've been working on this week.

    I might eventually try having these tasks open issues or even automatically address issues and open PRs, but we'll start with just reports for now.

    • NuclearPM 4 hours ago

      0 7 * * 1-5 ANTHROPIC_API_KEY=sk-... /path/to/claude-cron.sh /path/to/repo >> ~/claude-reports.md 2>&1

      Seems trivial.

      • esperent 3 hours ago

        A trivial way to rack up hundreds of dollars in API costs, sure.

        But you can set up a claude -p call via a cronjob without too much hassle and that can use subscriptions.

  • zmmmmm 5 hours ago

    i'm missing something basic here .... what does it actually do? It executes a prompt against a git repository. Fine - but then what? Where does the output go? How does it actually persist whatever the outcome of this prompt is?

    Is this assuming you give it git commit permission and it just does that? Or it acts through MCP tools you enable?

    • jngiam1 4 hours ago

      MCP tools. We're doing some MCP bundling and giving it here, pretty cool stuff.

      • ares623 22 minutes ago

        wasn't MCP a critical link in the recent litellm attack?

    • tossandthrow 5 hours ago

      We use to do do automated sec audits weekly on the code base and post the result on slack

      • zmmmmm 5 hours ago

        so is slack posting an MCP tool it has? or a skill it just knows?

        • tossandthrow 4 hours ago

          In Claude it is a "connector" which is essentially an mcp tool.

  • mkagenius 4 hours ago

    This is a bit restrictive, doesn't take screenshots. So you can't "say take screenshots of my homepage and send it to me via email"

    It doesnt allow egress curl, apart from few hardcoded domains.

    I have created Cronbox in the cloud which has a better utility than above. Did a "Show HN: Cronbox – Schedule AI Agents" a few days back.

    https://cronbox.sh

    and a pelican riding a bicycle job -

    https://cronbox.sh/jobs/pelican-rides-a-bicycle?variant=term...

  • arjie 5 hours ago

    What's the per-unit-time compute cost (independent of tokens)? Compute deadline etc.? They don't charge for the Cloud Environment https://code.claude.com/docs/en/claude-code-on-the-web#cloud... currently running?

  • pastel8739 5 hours ago

    Is this free? I don’t see pricing info. I guess just a way to make you forget that you’re spending money on tokens?

    • weird-eye-issue 5 hours ago

      You don't spend money on tokens. It is a subscription.

  • PeterStuer 3 hours ago

    Is only Github supported as a repository?

  • lucgagan 5 hours ago

    Here goes my project.

    • rhubarbtree 5 minutes ago

      Better idea. Watch online feedback on this feature. Then implement things users want. Go niche. Join the forum and help them use Claude to its limits. Then be the next step for power users.

    • hydroweaver87 4 hours ago

      What were you working on?

  • jngiam1 4 hours ago

    This is powerful. Combined with MCPs, you can pretty much automate a ton of work.

    • esperent 3 hours ago

      Can you give some examples?

      • adobrawy an hour ago

        That feature was silent launched about week ago for me.

        I use it to:

        - perform review of latest changes of code to update my documentation (security policies, user documentation etc.)

        - perform review to latest changes of code, triage them, deduplicate and improve code - I review them, close them with comments for over-engoneering / add review for auto-fix

        - perform review of open GitHub issue with label, select the one with highest impact, comment with rationale, implement it and make pull request - I wake up and I have a few pull request to fix issues that I can approve /finish in existing Claude Code thread

        I want also use it to: - review recent Sentry issues, make GitHub issues for the one with highest priority, make pull request with proposed fix - I can just wake up and see that some crash is ready to be resolved

        Limit of 3 scheduled jobs is pretty impactful, but playing with it give me a nice idea on how I can reduce my manual work.