Claude Code on the web

(anthropic.com)

279 points | by adocomplete 5 hours ago ago

161 comments

  • mmaunder 3 hours ago

    We were heavy users of Claude Code ($70K+ spend per year) and have almost completely switched to codex CLI. I'm doing massive lifts with it on software that would never before have been feasible for me personally, or any team I've ever run. I'll use Claude Code maybe once every two weeks as a second set of eyes to inspect code and document a bug, with mixed success. But my experience has been that initially Claude Code was amazing and a "just take my frikkin money" product. Then Codex overtook CC and is much better at longer runs on hard problems. I've seen Claude Code literally just give up on a hard problem and tell me to buy something off the shelf. Whereas Codex's ability to profoundly increase the capabilities of a software org is a secret that's slowly getting out.

    I don't have any relationship with any AI company, and honestly I was rooting for Anthropic, but Codex CLI is just way way better.

    Also Codex CLI is cheaper than Claude Code.

    I think Anthropic are going to have to somehow leapfrog OpenAI to regain the position they were in around June of this year. But right now they're being handed their hat.

    • jswny 30 minutes ago

      I find Codex CLI to be very good too, but it’s missing tons of features that I use in Claude Code daily that keep me from switching full time.

      - Good bash command permission system

      - Rollbacks coupled with conversation and code

      - Easy switching between approval modes (Claude had a keybind that makes this easy)

      - Ability to send messages while it’s working (Codex just queues them up for after it’s done, Claude injects them into the current task)

      - Codex is very frustrating when I have to keep allowing it to run the same commands over and over, Claude this works well when I approve it to run a command for the session

      - Agents (these are very useful for controlling context)

      - A real plan mode (crucial)

      - Skills (these are basically just lazy loaded context and are amazing)

      - The sandboxing in codex is so confusing, commands fail all the time because they try to log to some system directory or use internet access which is blocked by default and hard to figure out

      - Codex prefers python snippets to bash commands which is very hard to permission and audit

      When Codex gets to feature parity, I’ll seriously look at switching, but until then it’s just a really good model wrapped in an okay harness

    • CompoundEyes 2 minutes ago

      Claude Code is still good but I don’t TRUST it. With Claude Code and Sonnet I’m expecting failure. I can get things done but there’s an administrative overhead of futzing around with markdown files to keep it on rails and manage the context. Codex CLI with gpt-5-codex high reasoning is next gen. I’m sure Sonnet 5 will probably catch up.

    • pkreg01 2 hours ago

      I totally agree. I remember the June magic as well - almost overnight my abilities and throughput were profoundly increased, I had many weeks of late nights in awe and wonder trying things that were beyond my ability to implement technically but within the bounds of my conceptual understanding.

      Initially, I found Codex CLI with GPT-5 to be a substitute for Claude Code - now GPT-5 Codex materially surpasses it in my line of work, with a huge asterisk. I work in a niche industry, and Codex has generally poor domain understanding of many of the critical attributes and concepts. Claude happens to have better background knowledge for my tasks, so I've found that Sonnet 4.5 with Claude Code generally does a better job at scaffolding any given new feature. Then, I call in Codex to implement actual functionality since Codex does not have the "You're absolutely right" and mocked/placeholder implementation issues of CC, and just generally writes clean, maintainable, well-planned code. It's the first time I've ever really felt the whole "it's as good as a senior engineer" hype - I think, in most cases, GPT5-Codex finally is as good as a senior engineer for my specific use case.

      I think Codex is a generally better product with better pricing, typically 40-50% cheaper for about the same level of daily usage for me compared to CC. I agree that it will take a genuinely novel and material advancement to dethrone Codex now. I think the next frontier for coding agents is speed. I would use CC over Codex if it was 2x or 3x as fast, even at the same quality level. Otherwise, Codex will remain my workhorse.

      • catigula 12 minutes ago

        >I think, in most cases, GPT5-Codex finally is as good as a senior engineer for my specific use case.

        This is beyond bananas to me given that I regularly see codex high and Gpt-5-high both fail to create basic react code slightly off the normal distribution.

    • p337 5 minutes ago

      On the topic of comparing OpenAI models with Anthropocene models, I have a hybrid approach that seems really nice.

      I set up an MCP tool to use gpt-5 with high reasoning with Claude Code (like tools with "personas" like architect, security reviewer, etc), and I feel that it SIGNIFICANTLY amplifies the performance of Claude alone. I don't see other people using LLMs as tools in these environments, and it's making me wonder if I'm either missing something or somehow ahead of the curve.

      Basically instead of "do x (with details)" I say "ask the architect tool for how you should implement X" and it gets into this back and forth that's more productive because it's forcing some "introspection" on the plan.

    • WXLCKNO 2 hours ago

      I agree with this and actually Claude Code agrees with it too. I've had Codex cli (gpt-5-codex high) and claude code 4.5 sonnet (and sometimes opus 4.1) do the same lengthier task with the same prompt in cloned folders about 10x now and then I ask them to review the work in the other folder and determine who did the best job.

      100% of the time Codex has done a far better job according to both Codex and Claude Code when reviewing. Meeting all the requirements where Claude would leave things out, do them lazily or badly and lose track overall.

      Codex high just feels much smarter and more capable than Claude currently and even though it's quite a bit slower, it's work that I don't have to go over again and again to get it to the standards I want.

      • pkreg01 2 hours ago

        I share your observations. It's strange to see Anthropic loosing so much ground so fast - they seemed to be the first to crack long-horizon agentic tasks via what I can only assume is an extremely exotic RL process.

        Now, I will concede that for non-coding long-horizon tasks, GPT-5 is marginally worse than Sonnet 4.5 in my own scaffolds. But GPT-5 is cheaper, and Sonnet 4.5 is about 2 months newer. However, for coding in a CLI context, GPT-5-Codex is night-and-day better. I don't know how they did it.

    • kelvinjps10 16 minutes ago

      I did the opposite I switched to Claude code once the released the new model last week of the one before, I tried using codex, but there was issues with the terminal and prompting (multiple characters getting deleted) I found Claude code to have more features and less bugs, like the edit on vim for the prompt being really useful and find it better to iterate. Also I like more its tool usage and the use of the shell. Sometimes codex prefer to use python instead of doing the equivalent shell command. Maybe it's like the other people say here, that codex it's better for long running tasks, I prefer to give Claude small tasks and I'm usually satisfied with the result and I like to work alongside the agent

    • cesarvarela 2 hours ago

      Can you share an example of the tasks you found Codex being much better? From my experience Claude Code is much better.

      • intellectronica 2 hours ago

        Codex works much better for long-running tasks that require a lot of planning and deep understanding.

        Claude, especially 4.5 Sonnet, is a lot nicer to interact with, so it may be a better choice in cases where you are co-working with the agent. Its output is nicer, it "improvises" really well even if you give it only vague prompts. That's valueable for interactive use.

        But for delegating complete tasks, Codex is far better. The benchmarks indicate that, as do most practicioners I talk to (and it is indeed my own experience).

        In my own work, I use Codex for complete end-to-end tasks, and Claude Sonnet for interactive sessions. They're actually quite different.

        • incoming1211 21 minutes ago

          I disagree, Codex always gets stuck and wants to double check and clarify things, its like "dammit just execute the plan and don't tell me until its completely finished"

          The output of codex is also not as great. Codex is great at the planning and investigation portion but sucks at execution and code quality.

        • shmoogy an hour ago

          Can / Does Codex actually check docker logs and other things for feedback while iterating on something that isnt working ? That is where the true magic of Claude comes for me. Often things cant be one shot, but being able to iteratively check logs, make an adjustment, rebuild the docker containers, send a curl, and confirm fixed is huge improvement.

          • intellectronica 44 minutes ago

            Yes, in this regard it's very similar. It works as an agent and does whatever you need it to do to complete the task. In comparison to Claude it tends to plan more and improvise less.

      • mordymoop 2 hours ago

        I'm on the same page here. I have seen this sentiment about Codex suddenly being good a few times now, so I booted Codex CLI thinking-high back up after a break and asked it to look for bugs. It promptly found five bugs that didn't actually exist. It was the kind of truly impressively stupid mistake that I haven't seen Claude Code make essentially ever, and made me wonder if this isn't the sort of thing that's making people downplay the power of LLMs for agentic coding.

      • the_duke 2 hours ago

        IMO gpt5-codex medium is much better as soon as the task becomes slightly complex, or the context grows a bit.

        Sora 4.5 tends to randomly hallucinate odd/inappropriate decisions and goes to make stupid changes that have to be patched up manually.

      • simplify 2 hours ago

        Same here. I tried codex a few days ago for a very simple task (remove any references of X within this long text string) and it fumbled it pretty hard. Very strange.

        • fragmede 20 minutes ago

          yeah I'm in the same boat. Codex can't do this one task, and constantly forgets what I've told it, and I'm reading these comments saying how is so great to the point that I'm wondering if I'm the one taking the crazy pills. Maybe we're being A/B tested and don't know about it?

      • mmaunder 2 hours ago

        I can not. We're all racing very hard to take full advantage of these new capabilities before they go mainstream. And to be honest, sharing problem domains that are particularly attractive would be sharing too much. Go forth and experiment. Have fun with it. You'll figure it out pretty fast. You can read my other post here about the kinds of problem spaces I'm looking at.

        • mmaunder 2 hours ago

          I'm seeing the downvotes. I'm sorry folks feel that way. I'm regretting my honesty.

          Edit: I'd like to reply to this comment in particular but can't in a threaded reply, so will do that here: "Ah, super secret problem domains that have been thoroughly represented in the LLM training data. Nice."

          This exhibits a fundamental misunderstanding of why coding agents powered by LLMs are such a game changer.

          The assumption this poster is making is that LLMs are regurgitating whole cloth after being trained on whole cloth.

          This is a common mistake among lay people and non-practitioners. The reality is that LLMs have gained the ability to program, by learning from the code of others. Much like a human would learn from the code of others, and then be able to create a completely novel application.

          The difference between a human programmer an an agentic coder is that the agent has much broader and deeper expertise across more programming languages, and understands more design patterns, more operating systems, more about programming history, etc etc and it uses all this knowledge to fulfill the task you've set it to. That's not possible for any single human.

          It's important for the poster to take two realities on board: Firstly, agentic coding agents are not regurgitating whole cloth from whole cloth. Instead they are weaving new creations because they have learned how to program. Secondly, agentic coding agents have broader and deeper knowledge than any human that will ever exist, and they never tire, and their mood and energy level never changes. In fact that improves on a continuous basis as the months go by and progress continues. This means we can, as individual practitioners or fast moving teams, create things that were never before possible for us without raising huge amounts of money and hiring large very expensive teams, and then having the overhead of lining everyone up behind a goal AND dealing with the human issues that arise, including communication overhead.

          This is a very exciting time. Especially if you're curious, energetic, and are willing to suspend disbelief to go and take a look.

          • johnfn 28 minutes ago

            You’re getting downvoted because the amount of weight I place on your original comment is contingent on whether or not you’re actually using AI to do meaningful work ot not. Without clarifying what you’re doing, it’s impossible to distinguish you from one of those guys that says he’s using AI to do tons of work and then you peek under the hood and he’s made like 15 markdown files and his code is a mess that doesn’t do anything.

            Well, that, and it’s just a bit annoying to claim that you’ve found some amazing new secret but that you refuse to share what the secret is. It doesn’t contribute to an interesting discussion whatsoever.

          • zamadatix 2 hours ago

            Never hold regret for having honesty, it tends to lose its value completely if you only care about it when you have good news to deliver. If for anything, hold regret for when you didn't have something better appreciated to be honest about.

            The easier threading-focused approach to the conversation might be to add the additional comment as an edit at the end of the original and reply to the child https://news.ycombinator.com/item?id=45649068 directly. Of course, I've broken the ability to do that by responding to you now about it ;).

            • mmaunder 2 hours ago

              Thanks. I wasn't able to reply in a thread earlier - I guess HN has a throttle on that. So I edited the comment above to add a few more thoughts. It's a very exciting time to be alive.

            • mmaunder 2 hours ago

              lol, thanks.

          • kobe_bryant 2 hours ago

            this is absurd. no one needs or wants your AI generated answer that's a whole lot of nothing

            • mmaunder an hour ago

              Comments like this reveal the magnitude of polarization around this issue in tech circles. Most people actually feel this kind of animosity towards AI, and so having comment threads like this even be visible on HN is unusual. Needless to say, all my comments here are hand written. But the poster knows that, of course.

        • deadbabe 2 hours ago

          Ah, super secret problem domains that have been thoroughly represented in the LLM training data. Nice.

    • purnesh 16 minutes ago

      My experience is similar, but for me, Claude Code is still better when designing or developing a frontend page from scratch. I have seen that Codex follows instructions a bit too literally, and the result can feel a little cold.

      CC on the other hand feels more creative and has mostly given better UI.

      Of course, once the page is ready, I switch to Codex to build further.

    • maherbeg 3 hours ago

      Yeah this has been my experience as well. The Claude Code UI is still so much better, and the permissioning policy system is much better. Though I'm working on closing that gap by writing a custom policy https://github.com/openai/codex/blob/main/codex-rs/execpolic...

      Kinda sick of Codex asking for approval to run tests for each test instance

    • bcrosby95 3 hours ago

      Yeah, after correcting it several times I've gotten Claude Code to tell me it didn't have the expertise to work in one of my problem domains. It was kinda surprising but also kinda refreshing that it knew when to give up. For better or worse I haven't noticed similar things with Codex.

      • mmaunder 2 hours ago

        I've chosen problems with non-negotiable outcomes. In other words, problem domains where you either are able to clearly accomplish the very hard thing, or not, and there's no grey area. I've purposely chosen these kinds of problems to prove what AI agents are capable of, so that there is no debate in my mind. And with Codex I've accomplished the previously impossible. Unambiguously. Codex did this. Claude gave up.

        It's as if there are two vendors saying they can give up incredibly superpowers for an affordable price, and only one of them actually delivers the full package. The other vendor's powers only work on Tuesdays, and when you're lucky. With that situation, in an environment as competitive as things currently stand, and given the trajectory we're on, Claude is an absolute non-starter for me. Without question.

        • Aeolun 37 minutes ago

          I don’t think Claude is actually incapable, you just spend a lot of time telling it to yes, please actually do the difficult thing. Do not give up halfway through.

          Codex says “This is a lot of work, let me plan really well.”

          Claude says “This is a lot of work, let me step back and do something completely different that you didn’t ask for.”

        • skybrian 20 minutes ago

          We need product reviewers who can demonstrate things like this in public. Without details, "it works for me on my projects" only goes so far.

        • corndoge 2 hours ago

          Can you expound a bit on the problem domains? I am curious

    • lherron 2 hours ago

      Still a toss-up for me which one I use. For deep work Codex (codex-high) is the clear winner, but when you need to knock out something small Claude Code (sonnet) is a workhorse.

      Also CC tool usage is so much better! Many, many times I’ve seen Codex writing a python script to edit a file which seems to bypass the diff view so you don’t really know what’s going on.

    • sabareesh an hour ago

      Similar feeling. Seems it is good at certain things and if something doesnt work it want to do things simply and in turn becomes something that you didnt ask for and certain times opposite of what you wanted. On the other hand with codex certain time you feel the AGI but that is like 2 out of 10 sessions. This is primarily may be due to how complete the prompt and how well you define the problems.

    • mi_lk 3 hours ago

      What model are you using respectively? Not sure I share your observations

      • mmaunder 2 hours ago

        Have tried all and continue to eval regularly. I spend up to 14 hours a day. Currently recovering from a herniated disk because I spent 6 weeks sitting at a dining room table, 14 hours a day, leaning foward. Don't do that. lol. So my coverage is pretty good. I'm using GPT5-codex-high for 99% of my work. Also I have a team of 40 folks, about a third of which are software engineers and the other third are cybersecurity analysts, so I get feedback from them too and we go deep on our engineering calls re the latest learnings and capabilities.

    • poorman 2 hours ago

      Totally agree. I was just thinking that I wouldn't want this feature for Claude Code but for Codex right now it would be great! I can simply let tasks run in Codex and I know it's going to eventually do what I want. Where as with Claude Code I feel like I have to watch it like a hawk and interrupt it when it goes off the rails.

    • durron 3 hours ago

      Do you find this to still be true with the Sonnet 4.5 model?

      • extr 3 hours ago

        IMO Sonnet 4.5 is great but it just isn’t as comprehensive of a thinker. I love Anthropic and primarily use CC day to day but for any tricky problems or “high stakes, this must not have bugs” issues, I turn to Codex. I do find if you let Codex run on it its own too long it will produce comparably sloppy or lacking-in-vision type issues that people criticize Sonnet for, however.

        • PantaloonFlames 2 hours ago

          That’s a curious approach. Why would you use both? Why not just use the more reliable dependable option for all purposes?

          • extr 2 hours ago

            Sonnet 4.5/CC is faster, more direct, and is generally better at following my intent rather than the letter of my prompt. A large chunk of my tasks are not "solve this concurrency bug" or "write this entire feature" but rather "CLI ops", merging commits, running a linter, deploying a service, etc. I almost use it like it was my shell.

            Also while not quite as smart, it's a better pair programmer. If I'm feeling out a new feature and am not sure how exactly it should work yet, I prefer to work with Sonnet 4.5 on it. It typically gives me more practical and realistic suggestions for my codebase. I've noticed that GPT-5 can jump right into very sophisticated solutions that, while correct, are probably not appropriate.

            Sonnet 4.5: "Why don't we just poll at an interval with exponential backoff?"

            GPT-5: "The correct solution is to include the data in the event stream...let us begin by refactoring the event system to support this..."

            That said, if I do want to refactor the event system, I definitely want to use Codex for that.

          • wrs 2 hours ago

            In my experience, there isn’t a model that is more dependable for all purposes. They each have some unique strengths.

      • theshrike79 2 hours ago

        I'm like 80% sure Sonnet 4.5 is just rebranded Opus.

        Sonnet 4 was a coding companion, I could see what it was doing and it did what I asked.

        Sonnet 4.5 is like Opus, it generates massive amounts of "helper scripts" and "bootstrap scripts" and all kinds of useless markdown documentation files even for the tinies PoC scripts.

      • mmaunder 2 hours ago

        Yes. Sadly. And it really does make me sad. I was rooting for Anthropic. Still kinda am.

        • bgirard 2 hours ago

          I have a very similar experience. I was heavily invested in Anthropic/Claude Code, and even after Sonnet 4.5, I'm finding that Codex is performing much better for my game development project.

          • mmaunder an hour ago

            It seems particularly good at high performance programming in low level languages.

      • esafak 3 hours ago

        I don't. Sonnet is faster too.

    • catigula 14 minutes ago

      This is such an interesting perspective because I feel codex is hugely impressive but falls apart on any even remotely difficult task and is too autonomous and not eager enough.

      Claude feels like a better fit for an experienced engineer. He's a positive, eager little fellow.

    • asdev 2 hours ago

      do you use the CLI or the web UI? or both?

    • dboreham 2 hours ago

      This is going to be situation normal for 10 years: everyone will need to keep track of "model-du-jour" as each vendor makes incremental improvements.

    • mvkel 31 minutes ago

      This is why Anthropic is a zombie company.

      They put all of their eggs in the coding basket, with the rest of their mission couched as "effective altruism," or "safetyism," or "solving alignment," (all terms they are more loudly attempting to distance themselves from[0], because it's venture kryptonite).

      Meanwhile, all OpenAI had to do was point their training cannon at it for a run, and suddenly Anthropic is irrelevant. OpenAI's focus as a consumer company (and growing as a tool company) is a safe, venture-backable bet.

      Frontier AI doesn't feel like a zero-sum game, but for now, if you're betting on AI at all, you can really only bet on OpenAI, like Tesla being a proxy for the entire EV industry.

      [0] https://forum.effectivealtruism.org/posts/53Gc35vDLK2u5nBxP/...

      • F7F7F7 21 minutes ago

        For non-vibe coding purposes I've found that my $200 Claude (Claude Code) account regularly outperformed my $200 ChatGPT (Codex) account. This was after 2 months of heavily testing both mostly in Terminal TUI/CLI form and most recently with the latest VSCode/Cursor incarnations.

        Even with the additional Sora usage and other bells & whistles that ChatGPT @ $200 provides, Claude provides more value for my use cases.

        Claude Code is just a lot more comfortable being in your workflow and being a companion or going full 'agent(s)' and running for 30 minutes on one ticket. It's also a lot happier playing with Agents from other APIs.

        There's nothing wrong with Anthropic wanting to completely own that segment and not have aspirations of world domination like OpenAI. I don't see how that's a negative.

        If anything, the more ChatGPT becomes a 'everything app' the less likely I am to hold on to my $20 account after cancelling the $200 account. I'm finding the more it knows about me the more creeped out and "I didn't ask for this" I become.

        • fragmede 14 minutes ago

          Especially now that sama wants us to sext with ChatGPT

  • simonw 3 hours ago

    I had a preview of this over the weekend, notes here plus some example PRs: https://simonwillison.net/2025/Oct/20/claude-code-for-web/

    It's really solid. It's effectively a web (and native mobile) UI over Claude Code CLI, more specifically "claude --dangerously-skip-permissions".

    Anthropic have recognized that Claude Code where you don't have to approve every step is massively more productive and interesting than the default, so it's worth investing a lot of resources in sandboxing.

    • extr 3 hours ago

      It’s interesting because I’ve slowly arrived at the opposite conclusion: for much of my practical day to day work, using CC with “allow edits” turned OFF results in a much better end product. I can correct it inline, I pseudo-review the code as it’s produced, etc etc. Codex is better for “fire and forget” features for sure. But Claude remains excellent at grokking intent for problems where you aren’t quite sure what you want to build yet or are highly opinionated. Mostly due to the fact it’s faster and the iteration loop is faster.

      • vidarh 33 minutes ago

        It slows it down far too much for me. What I've found after swithcing to --dangerously-skip-permissions is that while the intermediate work product is often total junk, when I then start writing a message to tell Claude to switch approach, a large proportion of the time it has figured that out by itself before I'm finished writing the message.

        So increasingly I let it run, and then review when it stops, and then I give it a proper review, and let it run until it stops again. It wastes far less of my time, and finishes new code much faster. At least for the things I've made it do.

      • simonw 2 hours ago

        That approach should work well for projects where you are directly working on the code in tandem with Claude, but a lot of my own uses are much more research oriented. I like sending Claude Code off on a mission figure out how to do something.

        Here's an example from this morning, getting CUDA working on a NVIDIA Spark: https://simonwillison.net/2025/Oct/20/deepseek-ocr-claude-co...

        I have a few more in https://github.com/simonw/research

        • fragmede 5 minutes ago

          so hey by the way, have you discovered Wispr Flow or something similar so you can talk to your computer like Scotty does?

        • extr 2 hours ago

          Very fair. Interesting how much feedback on models/tools is different right now depending on what you're doing.

      • ryoshu 2 hours ago

        Agreed. I use CC a lot for exploratory work. It's great with fast iteration for throwaway code.

    • username223 an hour ago

      Do you have a practical sense of the level of mischief possible in the sandbox? It seems like a game of regexp whack-a-mole to me, which seems like a predictable recipe for decades of security problems. Allow- and deny-lists for files and domains seem about as secure as backslash-escaping user input before passing it to the shell.

  • brynary 4 hours ago

    The most interesting parts of this to me are somewhat buried:

    - Claude Code has been added to iOS

    - Claude Code on the Web allows for seamless switching to Claude Code CLI

    - They have open sourced an OS-native sandboxing system which limits file system and network access _without_ needing containers

    However, I find the emphasis on limiting the outbound network access somewhat puzzling because the allowlists invariably include domains like gist.github.com and dozens of others which act effectively as public CMS’es and would still permit exfiltration with just a bit of extra effort.

    • minimaxir 4 hours ago

      Link to the GitHub for the native sandboxing: https://github.com/anthropic-experimental/sandbox-runtime

      • navanchauhan 3 hours ago

        I used `sandbox-exec` previously before moving to a better solution (done right, sandboxing on macOS can be more powerful than Linux imo). The way `sandbox-exec` works is that all child processes inherit the same restrictions. For example, if you run `sandbox-exec $rules claude --dangerously-skip-permissions`, any commands executed by Claude through a shell will also be bound by those same rules. Since the sandbox settings are applied globally, you currently can’t grant or deny granular read/write permissions to specific tools.

        Using a proxy through the `HTTP_PROXY` or `HTTPS_PROXY` environment variables has its own issues. It relies on the application respecting those variables—if it doesn’t, the connection will simply fail. Sure, in this case since all other network connection requests are dropped you are somewhat protected but then an application that doesn't respect them will just not work

        You can also have some fun with `DYLD_INSERT_LIBRARIES`, but that often requires creating shims to make it work with codesigned binaries

    • merrvk an hour ago

      Nice its in the app, trying it out, seems damn buggy at the moment.

    • fragmede 3 hours ago

      Exfiltration is always going to be possible, the question is, is it difficult enough for an attacker to succeed against the defenses I've put in place. The problem is, I really want to share, and help protect others, but if I write it up somewhere anybody can read, it's gonna end up in the training data.

      • koolala an hour ago

        The attacker being an LLM where all humans have to be careful what they say publicly online is a fun vector.

  • mdeeks 2 hours ago

    I feel like these background agents still aren't doing what I want from a developer experience perspective. Running in an inaccessible environment that pushes random things to branches that I then have to checkout locally doesn't feel great.

    AI coding should be tightly in the inner dev loop! PRs are a bad way to review and iterate on code. They are a last line of defense, not the primary way to develop.

    Give me an isolated environment that is one click hooked up to Cursor/VSCode Remote SSH. It should be the default. I can't think of a single time that Claude or any other AI tool nailed the request on the first try (other than trivial things). I always need to touch it up or at least navigate around and validate it in my IDE.

    • elpakal an hour ago

      > PRs are a bad way to review and iterate on code

      idk, we’ve (humans) gotten this far with them. I don’t think they are the right tool for AI generated code and coding agents though, and that these circles are being forced to fit into those squares. imho it’s time for an AI-native git or something.

    • luisml77 39 minutes ago

      I agree and I also think the problem is deeper than that. It's about not being able to do most code testing and debugging remotely. You can't really test anything remotely really... Its in an ephemeral container without any of your data, just your repo. You can't have the model do npm run dev and browse to see the webpage, click around, etc. You can't compile or run anything heavy, you can't persist data across sessions/days, etc.

      I like the idea of background agents running in the cloud but it has to be a more persistent environment. It also has to run on a GUI so it can develop web applications or run the programs we are developing, and run them properly with the GUI and requiring clicking around, typing things etc. Computer use, is what we need. But that would probably be too expensive to serve to the masses with the current models

    • justinram11 2 hours ago

      Have you checked out Ona [1] (gitpod's pivot)?

      [1] https://ona.com/

      • mdeeks 2 hours ago

        This is possibly what I want? It's hard to tell from all of the marketing on the site.

        I want to run a prompt that operates in an isolated environment that is open in my IDE where I can iterate with the AI. I think maybe it can do this?

        • simonw 2 hours ago

          Not quite. This doesn't (yet) have an option where you can connect your local IDE to their remote containers to edit files directly. It's more of a fire-and-forget thing where you can eventually suck the resulting code down to your local machine using "claude --teleport ..." - but then it's not running in the cloud any more.

    • asdev 2 hours ago

      so the biggest issue is having to pull down and manually edit changes? can't you just @claude on the PR to make any changes?

      • mdeeks 2 hours ago

        Yes, but my point is often times I don't want to. Sometimes there are changes I can make it seconds. I don't want to wait 15+ seconds for an AI that might do it wrong or do too much.

        Also it isn't always about editing. It is about seeing the surrounding code, navigating around, and ensuring the AI did the right thing in all of the right places.

  • jackconsidine an hour ago

    > We were heavy users of Claude Code ($70K+ spend per year) and have almost completely switched to codex CLI

    Seeing comments like this all over the place. I switched to CC from Cursor in June / July because I saw the same types of comments. I switched from VSCode + Copilot about 8 months before that for the same reason. I remember being skeptical that this sort of thing was guerilla marketing, but CC was in fact better than Cursor. Guess I'll try Codex, and I guess that it's good that there are multiple competing products making big strides.

    Never would have imagined myself ditching IDEs and workflows 3x in a few months. A little exhausting

    • grrowl 32 minutes ago

      OpenAI seems to limit how "hard" your gpt-5-codex can think depending on your subscription plan; whereas Anthropic/Claude only limits how much use you get. I evaluate Codex every month or so with a problem suited to it, but rarely gets merged over a version produced by Charlie (which yes is $500/mo, but rarely causes problems) or something Claude did in a managed or unmanaged session. ymmv

    • rorads an hour ago

      I think it’s a lot less exhausting now that the IDE part is mostly decoupled. I can’t imagine cursor continuing to compete when really all they’re doing is selling tokens either a markup, and hence crushing your context on every call. Sorry if that sounds negative but it’s true.

      I use CC and codex somewhat interchangeably, but I have to agree with the comments. Codex is a compete monster, and there really isn’t any competition right now.

  • ed_mercer 17 minutes ago

    Is CC on the web able to spawn local containers? I would need to spawn a half dozen services locally in order to have a proper simulation of my actual working environment. Tool calling and integration with various microservices (e.g. postgres, playwright) is one of the most important uses of CC for us. For example, after telling CC to implement a feature, it needs to test that feature and confirm that any database changes are the way they're supposed to.

  • robertwt7 12 minutes ago

    This is very similar to Jules by Google! https://jules.google/

    Although I wish that the performance of Jules is worse than Gemini CLI. I hope that this is as good as the Claude Code CLI.

  • neilv 3 hours ago

    Nit about doing your AI interfaces on the Web: I really want claude.ai and chatgpt.com to offer a standard username+password login without 2FA. The kind my privacy-friendly browser of short-lived sessions can complete in a couple clicks, like for most other SaaSes, and then I'm in and using the tool.

    I don't want to leak data either way by using some "let's throw SSO from a sketchy adtech company into the trust loop".

    I don't want to wait a minute for Anthropic's login-by-email link, and have the process slam the brakes on my workflow and train of thought.

    I don't want to wait a minute for OpenAI's MFA-by-email code (even though I disabled that in the account settings, it still did it).

    I don't want to deal with desktop clients I don't trust, or that might not keep up with feature improvements. Nor have to kludge up a clumsy virtualization sandbox for an untrusted client, just to ask an LLM questions that could just be in a Web browser.

    • linkregister 3 hours ago

      In the modern age of mass credential stuffing attacks exploiting password reuse, MFA is one of the most effective tools for reducing unauthorized logins. Companies that don't adopt it are risking unacceptably high levels of credit card chargebacks.

      I wish the standard were for companies to check new passwords against leaked password lists, e.g. what https://haveibeenpwned.com uses.

      I use a similar workflow and have found that websites that allow passkey-based login can avoid the friction of waiting for TOTP codes or magic links.

    • amluto 2 hours ago

      How about using supporting WebAuthn?

      The current claude.ai signin mechanism is rather annoying.

  • yoavm an hour ago

    I was just working on something similar for OpenCode - pushing it now in case it's useful for someone[0].

    It can run in a front-end only mode (I'll put up a hosted version soon), and then you need to specify your OpenCode API server and it'll connect to it. Alternatively, it can spin up the API server itself and proxy it, and then you just need to expose (securely) the server to the internet.

    The UI is responsive and my main idea was that I can easily continue directing the AI from my phone, but it's also of course possible to just spin up new sessions. So often I have an idea while I'm away from my keyboard, and being up able to just say "create an X" and let it do its thing while I'm on the go is quite exciting.

    It doesn't spin up a special sandbox environment or anything like that, but you're really free to run it inside whatever sandboxing solution you want. And unlike Claude Code, you're of course free to choose whatever model you want.

    [0] https://github.com/bjesus/opencode-web

  • ea016 4 hours ago

    No relations to them, but I've started using Happy[0]'s iOS app to start and continue Claude Code sessions on my iPhone. It allows me to run sessions on a custom environment, like a machine with a GPU to train models

    [0] https://github.com/slopus/happy/

    • hmokiguess 3 hours ago

      This seems to be the only solution still if using bedrock or direct API access instead of Pro / Max plan, the Claude Code for Web doesn't seem to let you use it that way.

      • didgeoridoo 36 minutes ago

        You can log in to your CC instance however you like, including via Pro/Max. Happy just wraps it and provides remote access with a much better UI than using a phone-based terminal app.

        • hmokiguess 24 minutes ago

          Yes, that's precisely what I meant! I was talking with regards to the parent article about Claude Code on the Web via Anthropic.

  • dysoco an hour ago

    So from what I can understand this is only meant to be used with Claude-hosted sandbox environments?

    Wouldn't work for my case since I need a lot of HDD space, GPUs etc. to run the thing I'm working on, but it would be great if I could run a Claude Code server in my server, expose the port and then connect via web or iOS interface.

    Sure I can use tmux/ssh but it's very impractical specially in mobile.

  • Redster 4 hours ago

    Here's the link talking about the sandbox environment and features they're using for this Claude Code. https://www.anthropic.com/engineering/claude-code-sandboxing

  • ubj 4 hours ago

    Very curious to see what usage limits are like for paid plans. Anthropic was already experiencing issues with high-volume model usage for Pro and Max users. I hope their infrastructure is able to adequately support running these additional coding environments on top of model inference.

    Just to be clear, I'm excited for the capability to use Claude Code entirely within the browser. However, I've heard reports of Max users experiencing throttled usage limits in recent months, and am concerned as to whether this will exacerbate that issue or not.

    • CharlesW 3 hours ago

      Anecdotally, as a Max user typically using Claude Code for >8 hours/day, I've never experienced that. That said, I'm not one of those people using Opus for everything, and in fact I've been happy using Sonnet 4.5 even for planning.

    • minimaxir 3 hours ago

      I suspect the release of Claude Haiku 4.5 was done to help reduce usage costs for Anthropic and any use of Claude Code will differ to it if capacity is limited.

      EDIT: I had meant defer which is the first time I've made a /r/boneappletea in awhile

  • jryio 4 hours ago

    Pair programming is still one of the best ways to knowledge transfer between two programmers in a high throughput manner. Humans learn by doing, building synaptic connections.

    I wonder if a shared Claude Code instance has the same effect?

    • dingnuts 4 hours ago

      The person driving is the one that learns the most in pair programming. In the scenario you've described, that would be Claude. LLMs don't learn.

      Doesn't CC sometimes take twenty, thirty minutes to return an attempt? I wouldn't know, because I'm not rich and my employer has decided CC is too expensive, but I wonder what you would do with your pair programming partner while you wait.

      The bosses would like to think we'd start working on something else, maybe start up a different Claude instance, but can you really change contexts and back before the first one is done? You AND your partner?

      Nah, just go play air hockey until your boss realizes Claude is what they need, not you.

      • astrange 3 hours ago

        You can get plenty of CC on a $20/month plan.

      • myko 3 hours ago

        > Nah, just go play air hockey until your boss realizes Claude is what they need, not you.

        This is a depressing comment.

        I am apprehensive about the future of software development in this milieu. I've pumped out a ~15,000 line application heavily utilizing Claude Code over a few days that seems to work, but I don't know how much to trust it.

        Certainly part of the fun of building something was missing during that project, but it was still fun to see something new come to life.

        Maybe I should say I am cautiously optimistic but also concerned: I don't feel confident in the best ways to use these tools to build good software, and I'm not sure exactly what skills are useful in order to get them there.

        • losteric 40 minutes ago

          > I've pumped out a ~15,000 line application heavily utilizing Claude Code over a few days that seems to work, but I don't know how much to trust it.

          Can I ask what you built?

      • mr_mitm 3 hours ago

        Just for the record, CC is about the cost of a Netflix subscription, and it responds faster than any human can.

  • fny 3 hours ago

    I've been using Happy Coder[0] for some time now on web and mobile. I run it `--yolo` mode on an isolated VM across multiple projects.

    With Happy, I managed to turn one of these Claude Code instances into a replacement for Claude that has all the MCP goodness I could ever want and more.

    [0]: https://happy.engineering/

    • ShipEveryWeek 3 hours ago

      This looks nice! I’ve been using terminus + tailscale to get similar results, but I’ll give this a go

  • lysecret 2 hours ago

    Just played around with it the fact it’s on the phone is a big bonus.

    I have setup a little workflow where given linear tags it sets up a work tree on my dev box installs deps and starts the implementation so I can take it over I prefer this workflow to the fully managed cloud based solutions.

    This kind of fits in for issues where I’m basically sure I won’t have to take it over (and it can do it fully on its own). Which aren’t that many.

    Very simple example there was a warning pop up on something where I thought there shouldn’t be now it’s done fully automatically from my phone in 5 mins. I quite like that these small changes become so easy.

  • charlesabarnes 4 hours ago

    It's pretty frustrating that every release is IOS first without any timeline or expectation for Android

    • outime 4 hours ago
      • poly2it 3 hours ago

        It is also relevant to know if a user who'd otherwise use app X on iOS would use X less on Android.

    • alwillis 3 hours ago

      Not unusual; most high profile apps ship on iOS first, going back to Instagram [1], which was released October 10, 2010. Instagram shipped their Android version 1.5 years later.

      [1]: https://www.techtarget.com/searchcio/definition/Instagram

      • spondyl 3 hours ago

        Another, not incompatible explanation is that it's also just easier to develop for a handful of known iOS/iPadOS targets compared to Android's unbounded set of screen sizes and device specs.

        • wahnfrieden 3 hours ago

          If your app runs on iPadOS, you already need to support every "screen size" (window size)

          Android is simply a much worse platform to make money on. Users spend <25% as much as iOS users. Why would they prioritize that?

          • djmips an hour ago

            In practice Android is much more difficult to handle the myriad of offerings - Have you ever tried both? To your other point, what app spend would Anthropic be worried about - they have a subscription model.

    • richardw 3 hours ago

      It’s much harder dealing with all the complexities of different devices, screen sizes, OS versions.

      https://www.reddit.com/r/applesucks/comments/1k6m2fi/why_do_...

    • bahmboo 4 hours ago

      Anthropic and Apple have a strategic partnership. It's a bit dicey but still seems to be in play. Which is interesting considering Google is a major investor and Apple is not. Anthropic wants Apple as a paying customer. Apple wants them to bend the knee.

      • lvl155 3 hours ago

        Apple also has relationship with OAI. They’re not preferential.

        • bahmboo 4 minutes ago

          Yes but the question was why Anthropic is showing more attention to iOS vs Android.

    • pjmlp 4 hours ago

      It is basically a US centric view of mobile OS market share.

    • wahnfrieden 3 hours ago

      Android is a tiny market

      • OJFord 3 hours ago

        You probably mean 'in the US', where iOS is 58%. Android has a 71% global market share.

        • bdcravens 2 hours ago

          Yes, if all you consider are the number of devices in use. However once you segment by devices with performance to run a given app and financial demographics that match your target customer, the numbers change.

        • wahnfrieden 3 hours ago

          No. Why do user counts matter? High user count but with >4x thriftiness / aversion to spending is not an attractive market over iOS.

          Globally in dollars spent, not human heads. iOS is over 2x larger than Android globally, and the gap is widening year over year.

          iOS spending growth outpaces Android, which even shrunk during covid while iOS spending continued to grow

          https://api.backlinko.com/app/uploads/2024/03/iphone-vs-andr...

          Anthropic makes money off product sales, not ad revenue, so wallets count more than eyes for this. Free users who are less than 25% as likely to spend are a burden not to be prioritized for a product business with free tier access. They need to spend much more to get a paying user on Android.

          If Android were the bigger market, they'd prioritize it

  • jngiam1 3 hours ago

    I got so used to having Claude Code read some of my MCP tools, and was bummed to see that it couldn't connect to them yet on the web.

    Pretty cool though! Will need to use it for some more isolated work/code edits. Claude Code is now my workhorse for a ton of stuff including non-coding work (esp. with the right MCPs)

  • arjie 2 hours ago

    A thing I really like with Claude Code is how well it uses the bash scripts you give it. I also have a browser control MCP installed and it's pretty good for it to full-cycle around the approach. I have a staging database that it has the passwords to that it logs in and runs queries on. This whole thing means it loops and delivers good results for me.

    I'll try this, but the grounding seems crucial for these LLMs to deliver results that are fewer shot than otherwise.

    • hugs 2 hours ago

      which specific functions/features of the browser control MCP do you lean on the most?

  • cube2222 3 hours ago

    This is quite nice!

    I'm using Claude Code locally a lot, occasionally with a couple parallel session.

    I was very happy when they made the GitHub Action - I used it quite a bit, but in practice I got frustrated that I effectively only get a single back-and-forth out of it, I can't really "continue the conversation without losing context" - Sure, I can respond to it in the PR it makes, but that will be a fresh session with a fresh empty context.

    So, as much as I don't like moving out of my standard development workflow with my tools, I think this could be quite useful. The ability to interrupt and/or continue a conversation should be very nice.

    My main worry is - usually my unit tests and integration tests rely on a postgres database running on the machine, and it's not obvious to me if I can spin that up here?

  • shireboy 3 hours ago

    I really want this but for Azure Devops. If you're not familiar, Microsoft owns both Github and Azure Devops, and both do similar: git repos and project management. I can use Github Copilot, Claude Code CLI, etc. against code on my disk, including Azure Devops MCP. But what I can't easily do is like Github Copilot Agent and apparently this Claude Code on Web: Assign a ticket to @SomeAi and have a PR show up in a few minutes. Can't change to github for _reasons_.

    Would love any suggestions if anyone in a similar story.

  • bgirard 2 hours ago

    Looks promising.

    I got my environment working well with Codex's Cloud Task. Trying to same repo with Claude Code Web (which started off with Claude Code CLI mind you), and the yarn install just hangs with no debuggable output.

  • hnidiots3 an hour ago

    I wonder why people don’t just use Amp Code and use the Oracle.

    It’s Sonnet 4.5 + GPT-5 working together.

    Codex just isn’t as good as people make it out to be. OpenAI seems to train on a lot of JavaScript/Tailwind to make visuals look more impressive but when it comes to actual backend work it just fails more than it succeeds. Sonnet is much better at chewing through tasks and GPT 5 is great at consulting planning and analysis.

    Using Amp and asking it to check everything with the oracle leads to superior results.

    But no one on HN has heard of it. I’m guessing HN hates twitter?

    • SalmoShalazar 15 minutes ago

      Not sure what your twitter comment is about, I use it and I’ve just never heard of this product. Looks cool, I will give it a test.

  • low_tech_punk 2 hours ago

    IMHO, parallel tasks across multiple repos is not as useful as parallel tasks in one repo.

  • cesarvarela 2 hours ago

    Does this work inside docker containers like Codex? Stuff like `testcontainers` is unusable with that architecture because you need access to docker itself.

    • lysecret 2 hours ago

      Yea it failed on testcontainers for me. The pnpm install worked fine though.

  • minimaxir 4 hours ago

    I like how in the demo video there's a squiggle emphasis on Claude's "Good Idea!" in response to a user clarification, when it's more common among vibe coders that that less glazing is better and they just want the LLM to write code.

  • jannniii 4 hours ago

    I’m wondering if it would be possible to use the new skills feature or agents with this. Without the agents or the skills, I don’t know how useful this would be.

    • simonw 3 hours ago

      It's running Claude Code CLI on a container for you, so skills should just work. I've not tried them myself yet though.

  • nextworddev 37 minutes ago

    Developers may want to deny this, but it's getting dangerously close to maybe replacing 30% of developers

    • simonw 33 minutes ago

      I continue to believe that making developers 2-3x times more productive makes those developers 2-3x more valuable, and the smart thing for companies to do is to take on 2-3x times the amount of work, or hire MORE developers and finally start crunching through their inevitably years-long backlogs.

      • nextworddev 5 minutes ago

        Your view doesn’t mesh with conversations I have had with most C-suite. Most firms outside of SV are seeing opportunities for cost reduction mostly.

        And you can think through with first principles to see why it won’t expand developer hiring. Since AI progress is jagged, some industries will be affected in outsized ways while others may thrive more. But the increase in demand from new industries won’t absorb the reduction in demand from disrupted industries.

  • aantix 3 hours ago

    Does this web interface have support for AWS Bedrock?

  • mkummer 4 hours ago

    Is the web interface open sourced anywhere? Looks great, excited to try it out

  • Stevvo 3 hours ago

    Guess they couldn't name it "Claude Codex"

  • bitpatch 3 hours ago

    This is kind of nice, as much as I love a good TUI, sometimes text editing in claude code can trip me up compared to a web GUI

  • mrcwinn 3 hours ago

    We’re moving almost entirely to Codex, first because often it’s just better, and second because it’s much cheaper. It’s a bet that they’re better now, but given capacity and funding, they’ll be better later too.

    The only edge Claude has is context window, which we do sometimes hit, but I’m sure that gap will close.

    • esafak 3 hours ago

      You're using the metered API rather than a subscription, right?

    • asdev 2 hours ago

      are you using the web ui, cli or both?

  • bgwalter 2 hours ago

    I have never seen such a bunch of uncreative people who have never written a real application, never done anything artistic, never said anything intelligent try to ruin software development to the extent that the "AI" companies do.

    They want to turn everything into a bootstrap framework, which is probably the limit of their mental horizon. And many people maintain that the emperor is fully clothed and that the scam works.

  • lvl155 3 hours ago

    I am not a big fan of these. They’re trying to bundle compute and jack up the prices down the road.