I have used Claude Code heavily, and I've been forced to use Gemini CLI heavily (for a particular client project).
Of all my issues with Gemini CLI (and there are many), this addresses none of them. This is a fascinating product management prioritization decision. It makes me wonder if the people who build Gemini CLI actually use Gemini CLI for real work. Because I would think that if they did, they would surely have prioritized other things.
My personal biggest issue with Gemini CLI, which is a deal breaker if I have a say in the tooling I'm using, is that if you hit a per-minute rate limit (meaning it will be resolved in a few seconds) your session is forcefully and permanently switched over to using Flash and there is nothing you can do other than manually quit and restart to get back to using Pro 2.5. The status footer line will even continue to lie to you about what model you are using. I would genuinely like to understand the use cases for which this is desirable behavior. But even IF those use cases do exist, what is the harm or difficulty in giving an option to override this behavior? These models are not interchangeable. GitHub issues have been opened for months, some even with PRs attached, with no action from Google.
For comparison, Claude Code handles this situation with a simple exponential back off until the request succeeds. That's what I want, ESPECIALLY in a CLI agent that may be running headlessly in a pipeline.
Google is a proving ground for building "wow" factor products to line your resume with.
There isn't a drive to actually cater to users, it's a selfish endeavor which sometimes aligns with what users want. So the game is feature pack so you can leverage it for jumping ship or spring boarding internally.
It's the absolutely worst aspect of google, and I think its something worth dumping Sundar over, in order to get in a leader that will unify goals and get people who want to make great products, not great window dressings for themselves.
This is a very valuable use case for me personally. I frequently have the problem that I change things on the filesystem in a separate tab and the agent context gets out of sync. It fails on subsequent edits, often tries to reverse the changes that I made, and many times I have to copy/paste the command I ran and it's output back into the agent window.
Your complaint is likely a product design decision rather than a engineering capacity prioritization one. As you've noted the fix is pretty trivial. I imagine that some designer or product person is intentionally holding this back for one reason or another
Claude Code has an elegant solution to the problem you mention, without trying to cram everything into a single nested pane (which feels wrong to me).
In Claude Code, when you edit a file independent from the agent, it automatically notices and says something like "I see you've made a change. Let me take a look."
I wish Gemini CLI would've taken a similar approach, since it seems to fit better with a CLI and its associated Unix philosophy.
These tools aren’t made to be used, they’re made to make the CEO look less bad by showing that “Google has these things too! We’re not falling behind! Don’t tell the board to vote to fire him!”
That’s it.
It’s not a “product”, it’s a keeping-up-with-the-Joneses checklist item.
I've had to convince it to do things it should just be able to do but thinks it can't for some reason. Like reading from a file outside of the project directory- it can do it fine, but refuses to unless you convince it that no it actually can.
Also has inserted "\n" instead of newlines on a number of occasions.
I'd argue these behaviors are much more important than being able to use interactive commands.
Gemini doesn't seem to be trained on tool use (which Claude is) so it quiet often thinks it can't do something it certainly can and does a lot of nonsense. For me it fails nearly everytime while it's trying to read project files because it uses relative paths instead of absolute so I've put "For your "ReadFile" and "WriteFile" tool, you MUST use absolute paths to files" in my system instructions.
Speaking of system instructions, Gemini always forgets them or doesn't follow them. And it still puts code comments nearly everywhere, it drives me nuts.
Codex is much better at following system instructions but the CLI is..... very bad.
My experience with Gemini 2.5 Pro has oddly been better, maybe because I use RooCode/Cline? It was oddly apologetic, though, wasting tokens on lamenting its failure when it fails to do something and whatnot, instead of just getting on with the solution.
At the same time, even the big versions of Qwen3 Coder (480B) regularly mess up file paths and use the wrong path separators, leading to files like srccomponentsMyComponent.vue from being created instead of src/components/MyComponent.vue.
> And it still puts code comments nearly everywhere, it drives me nuts.
I’ve had the issue of various models sometimes inserting comments like “// removed Foo” when it makes no sense to indicate the absence of something that’s not necessary there for a code block that isn’t there.
At the same time, sometimes the LLMs love to eat my comments when doing changes and leave behind only the code.
How silly (and annoying). It’s good to be able to try out multiple models with the exact same prompts though, maybe I should create my own custom mode for RooCode with all of the important stuff I want baked in.
> Speaking of system instructions, Gemini always forgets them or doesn't follow them. And it still puts code comments nearly everywhere, it drives me nuts.
Yup, I've tried to use Gemini so many times, but the lack of being able to strictly follow system prompts makes it so hard to get useful stuff out of it that doesn't need to be cleaned out. Code comments is short of impossible to get rid of, they must have trained it with only code that has comments, because the model really likes to add them everywhere.
Every agent+model combination has issues right now, I'm personally swapping between them depending on the task.
Gemini is great for stuff you need fast and don't care about the quality, as you can just throw it away.
Claude Code + Sonnet is great in many ways and follows prompts way better, but has a tendency to go off on tangents and really get lost in the woods. It requires handholding and basically interrupt it as soon as you see something weird, to steer it in the right direction. Complex stuff has to be aggressively split into smaller validated sub-tasks manually. Tends to also stop continuing by itself to say "Well, we've done half now, you want me to continue with the other half?"
Codex + GPT-5is the best at following prompts, produces the highest quality code, but is way slower than others, and still struggles with seemingly arbitrary stuff yet able to solve complex tasks by itself without any hand-holding. It can get stuck on something obvious, but at least it won't run off on it's own and it'll complete everything as well as it can, even if it takes 30 minutes.
Qwen Coder seems outright unusable and haven't been able to use it for anything good at all.
Tried AMP for a while as well, nice UI and model seems good, but too expensive (and I say this as someone who currently gives $200/month to OpenAI).
Gemini seems to have a poor model of both what it can and what it is allowed to do.
I’ve noticed the latter with several image generation refusals I could eventually easily talk them out of (usually by mentioning fair use in a copyright/trademark context).
> Gemini seems to have a poor model of both what it can and what it is allowed to do.
Starting to feel like LLMs models are more of a representation of the culture of the company training them, than a fair representation of the world at large.
Well, those are problems with the underlying Gemini models. It's not like the team responsible for CLI could have trained a better model instead of making this feature.
Gemini 3.0 is likely to be released soon, and likely they would improve agentic coding experience.
GPT-5 insisted on using bash commands to edit a file, despite the dedicated tool for doing this. Problem was that the bash tool it used wrapped at 80 chars, splitting some strings between lines, which then broke the code at a syntax level. It was never able to recover, I was not impressed with GPT-5
Gemini CLI is definitely a much worse client than some of the other agent clients like opencode, cursor etc. But from my experience, that isn't because of the model quality. I get better quality responses from the gemini web chat interface than chatgpt, claude etc.
Of course my experience is anecdotal, but we hardly have any decent benchmarks to compare these models. I suspect most benchmarks have leaked into training sets, rendering them useless anyway.
I agree, gemini pro is a great model for coding if you don't need to do agentic work. I've found that it's a lot less "wordy" when editing, debugging, reviewing, etc. It gets to the point whereas other models can provide long useless explanations. It's also very smart and great with long context.
Also people don't talk enough about (or are bad at separating themselves) the model vs. the client tool - e.g. from your comment maybe using codex/Claude Code/aider with Gemini API would be better, best even, but people rarely make that comparison or separation, it's always 'Claude Code with Claude vs. codex with GPT-x' etc.
Yeah. The client tool does make a difference. For example opencode, if I am correct, just spins up its own language servers and then feeds the language server errors back into the model, resulting in a much better agentic coding experience. I don't think they are doing anything much more complex than that.
Unfortunately, nearly all the foundation model companies are just wasting their efforts on the clients, which are kind of ass, instead of focusing on the model.
Google would be much better off if they ditch their dogshit cli, and allow us to have the generous quota login off any client.
To be fair, most of the times, the tools works best with the models trained with those tools in mind, and vice-versa.
Not to mention not all models/inference works the same way so you can't really replicate the same experience. For example, new Harmony format means you can now inject messages while GPT-OSS is running inference, but obviously Claude Code don't support that because their models don't support that.
What do you expect? People building software using other models than they themselves develop? Or people training the models train them for software that isn't the software they develop themselves?
It's like saying official car repair shops should repair any type of car, not just their brand. That's just not how the real world works.
I second this man’s take. I’ve been using it consistently for a few months to give it a try and is definitely subpar. It can give really good answers at times however isn’t worth the time, energy, or luck to get it there.
As someone who has been experimenting with AI ‑powered command‑line helpers, I think adding interactive commands to the Gemini CLI is a logical step, but it won’t be useful unless the underlying model is reliable for basic tasks. Several people here noted that Gemini sometimes refuses to read files outside the project directory or mishandles newlines; those sorts of inconsistencies undermine trust.
In a world where you have 100 options, trust is of utmost importance. The CLI’s integration with node‑pty and the ability to stream pseudo‑tty output into mini‑terminal viewports is clever, and I’d love to see that layer documented or open‑sourced so other tools can build on it. I see this feature as something you’d use for short‑lived tasks like running a quick script, checking a log, or doing a one‑off database query. For longer editing sessions I’d still use a real terminal multiplexer and editor. If Google can fix the reliability issues and make the API for interactive sessions open, that would be hella good for everyone!
I made a mcp that would use a pty lib to allow claude to debug a TUI app I was writing with ok-ish results. ultimately I wanted to see what was happening myself so when I need interactive I just tell it to use tmux-cli to capture the neighboring pane. https://github.com/pchalasani/claude-code-tools/blob/main/do...
maybe turning that into a mcp with more guardrails and integrated guide to the agent would make it more popwerful
I'm not actually sure about that (that turning it into an MCP would help). I've seen more momentum building around having better cli tool integration with ClaudeCode than MCP reliance.
I couldn't tell from the post how this will affect Gemini's ability to assist better as a result.
I guess for Google this will be a treasure trove of real developer interactions to train on.
I might try this once Gemini 3 comes out. Until then, if you're running tmux or zellij, this seems like a worse user experience since you're in a subwindow and have less screen real estate to work with.
They will do it with needlessly complexity that is out of step with the competition, as they did with slash commands (toml) and extensions (skills-equivalent).
I think that this feature might have taken Gemini CLI from just Temu Claude Code with higher usage limits, to actually competitive as a tool. It'll be interesting to see how well this actually works in practice.
I tend to agree but there are a few scenarios where I really want it to work. Debuggers in particular seem hard to get right for the current agents. I’ve not been able to get the various MCP servers I’ve tried to work, I’ve struck out using the debug adapter protocol from agent-authored python. The best results I’ve gotten are from prompting it to run the debugger under screen, but it takes many tool calls to iterate IME. I’m curious to see how gemini cli works for that use case with this feature.
I would love to use gdb through an agent instead of directly. I spend so much time looking up commands and I sometimes skip things because I get impatient stepping over the next thing
1 in 3 times I used it in past 2 months, it failed for really odd reasons, sometimes the node app just exception quit, sometimes gemini stuck and blame itself and gave up. same task I throw to cc and codex, they nailed without a blink...
Does anyone know / care to speculate how they actually make this work, in terms of the LLM call loop? Specifically: does it call back to the LLM after each keystroke sending it the new state of the interactive tool, or does it batch keystrokes up? If the former, isn’t that very slow? If the latter, won’t that cause it to make mistakes with a tool it hasn’t used before?
> It's not just a stream of text; it's a live feed.
LLM wrote this article it seems.
For me Gemini CLI is not as good as Claude Code and sometimes writes more code than necessary and makes it hard to maintain. but hope it gets there with gemini 3.0 release. It's open source so I can imagine it getting there faster with community contributions.
I stopped reading at that point, it was a signal that I’d just be reading another several paragraphs of repetitive prose with random bolded text. It also put such strange over emphasis on an implementation detail that is pretty much irrelevant to users which made it actively distracting on top of being an obvious LLMism.
Building an interactive shell inside their CLI seems like a very odd technical solution. I can’t think of any use case where the same context gathering couldn’t be gleaned by examining the file/system state after the session ended, but maybe I’m missing something.
On the other hand, now that I’ve read this, I can see how having some hooks between the code agent CLIs and ghostty/etc could be extremely powerful.
LLMs in general struggles with numbers, it's easy to tell with the medium sized models that struggle with line replacement commands where it has to count, it usually takes a couple of tries to get right.
I always imagined they'd have an easier time if they could start a vim instance and send search/movement/insert commands instead, not having to keep track of numbers and do calculations, but instead visually inspect the right thing happening.
I haven't tried this new feature yet, but that was the first thing that came to mind when seeing it, it might be easier for LLMs to do edits this way.
Personally haven't had that happen to me, been using Codex (and lots of other agents) for months now. Anecdote, but still. I wrote up a summary of how I see the current difference between the agents right now: https://news.ycombinator.com/item?id=45680796
It's nice that they mention node-pty that does most of the heavy lifting for the terminal/pseudo-tty that powers this (VSCode's terminal emulator is powered by the same library).
It looks like they've added a layer on top of node-pty to allow serializing/streaming of the contents to the terminal within the mini-terminal viewports they're allocating for the terminal rendering. I wonder if they're releasing that portion as open source?
From the blog " Gemini CLI spawns a new process within a pseudo-terminal in the background, leveraging the node-pty library...So how does this virtual terminal running in the background show up on your screen? Think of it like a video stream. Our new serializer takes a snapshot of the pseudo terminal at every moment—capturing every piece of text, every color, and even the cursor's position. These snapshots are then streamed to you, allowing you to see and interact with the terminal application in real-time. It's not just a stream of text; it's a live feed."
<rant> How many people are running LLMs CLIs instead of using their APIs? It seems so obnoxious to me that using a CLI command is cheaper than using their APIs, hence forcing them to build these kind of work arounds.
Maybe I'm not getting it right, but it seems there are two competing paradigms which certainly with llms coding for llms, who cares. </rant>
Trying to use Gemini CLI is one of the most frustrating experiences with any tool I've had in over two decades of working with software.
It's seemingly very hard to understand how it should be configured at all if you don't have a personal Google account. Rather than just using your credentials to login and start, you need to find some forum posts of people that have reversed engineered that you need to use a Google Cloud environment variable, even if you are operating without a "Code Assist License" on a Google Business account.
No matter what I do on my paid subscription through Google Business with a Google Cloud project provided in the environment configured, which I had to explicitly set up just to test the CLI even though I have access to the Models through my subscription and AI Studio, I always get error 429 after one to five messages. The limits that Google claim on Gemini seem to be just a fraction of what is claimed in my case, No clearly stated reason as to why, not in the cloud console and not when using the tool itself, except for the HTTP error message.
These are not big prompts or anything of that nature. It's simple things like review a readme file or double check a single file for errors. It's been like this from the very beginning.
Even now just to verify it, I havent used Gemini for over a week, I ask it to review 3 files that are in git diff, the files are between 50-100 lines long, after checking the first file it's already on 429, on a PAID subscription, and it even states "99%" context left. So my paid subscription lets me use less than 1% of the context window and I get locked out for a unknown amount of time.
Contrasting this to both Codex and Claue Code, where you just log in and go, it's really a night and day difference. The user experience of the paid version of Gemini CLI is just utterly terrible.
Popping into nvim to check on something really quick seems immediately useful. I think I'll still want a dedicated tab or different terminal app to have my longer lived editor open but this might be nice for validating output with test runners or checking on a database entry in psql or something.
It was very buggy for me. You kind of have to coax it into interactive use and then some of the time it got stuck pondering once I exited the app flow and returned to the Gemini CLI (not with Ctrl-F, full exit, it closes the TUI window). It's also super laggy.
To be honest, at this point having Claude Code monitor the output of a `tmux pipe-pane` is probably going to be superior.
To be clear, the LLM is only aware of the final state of the ptty when the command exits, right? It's not a TUI computer-use model at this point from what I can tell.
Aside: The demo shows git commands being run in the CLI. I absolutely hate it when devs use a commit message that says "chore: my first commit from gemini cli" - I get that it's meant for the demo, but in general too, I've seen codebases that enforce these commit prefixes such as "chore", "feat", "bugfix" etc. Is there any real value to that? Besides wasting up the 50 character limit on the first line of the commit message, I don't see anything else being done including those. Also, non-imperative commit messages?! Come on, guys!
If you manage a product that releases changelogs then by tagging commits that way you can automatically group changes into headers like that when generating your changelog from your git history. It's fairly common in open source projects. If you however are working on some internal stuff at a company, and you don't generate changelogs from your commits then doing conventional commits isn't that useful.
If you're looking in the commit tree for which commit fixed a certain bug, but didn't fix it fully , for example , you first look at all the `fix:` and then if it matches, you read the rest.
You just write `fix: Thumbnail wasn't updating after upload`
to `Fix for Thumbnail not updating after upload`, which isn't really wasting characters.
But I'm also not a fan of this being an enforced convention because somebody higher up decided he/she it brings some value and now it's the 101st convention a new dev has to follow which actually reduces productivity.
> I've seen codebases that enforce these commit prefixes such as "chore", "feat", "bugfix" etc. Is there any real value to that?
It's a choice some teams make, presumably because _they_ see value in it (or at least think they will). The team I'm on has particular practices which I'm sure would not work on other teams, and might cause you to look at them with the same incredulity, but they work for us.
For what it's worth, the prefixes you use as examples do arise from a convention with an actual spec:
Just because someone put up a fancy website and named it "conventional" doesn't mean it's a convention or that it's a good idea.
The main reason this exists is because Angular was doing it to generate their changelogs from it. Which makes sense, but outside of that context it doesn't feel fully baked.
I usually see junior devs make such commits, but at the same time they leave the actual commit message body completely empty and don't even include a reference to the ticket they're working on.
i’ve had little luck getting ai systems to correctly set up networking for a set of vms. they tend to go round and round with ip tables commands that don’t ultimately solve the problem. is config fundamentally harder than writing code ?
i give feedback by copy-pasting output. hence the round and round. maybe if i had a sandbox that the model could run on autonomously it might ho better/faster.
I have used Claude Code heavily, and I've been forced to use Gemini CLI heavily (for a particular client project).
Of all my issues with Gemini CLI (and there are many), this addresses none of them. This is a fascinating product management prioritization decision. It makes me wonder if the people who build Gemini CLI actually use Gemini CLI for real work. Because I would think that if they did, they would surely have prioritized other things.
My personal biggest issue with Gemini CLI, which is a deal breaker if I have a say in the tooling I'm using, is that if you hit a per-minute rate limit (meaning it will be resolved in a few seconds) your session is forcefully and permanently switched over to using Flash and there is nothing you can do other than manually quit and restart to get back to using Pro 2.5. The status footer line will even continue to lie to you about what model you are using. I would genuinely like to understand the use cases for which this is desirable behavior. But even IF those use cases do exist, what is the harm or difficulty in giving an option to override this behavior? These models are not interchangeable. GitHub issues have been opened for months, some even with PRs attached, with no action from Google.
For comparison, Claude Code handles this situation with a simple exponential back off until the request succeeds. That's what I want, ESPECIALLY in a CLI agent that may be running headlessly in a pipeline.
Google is a proving ground for building "wow" factor products to line your resume with.
There isn't a drive to actually cater to users, it's a selfish endeavor which sometimes aligns with what users want. So the game is feature pack so you can leverage it for jumping ship or spring boarding internally.
It's the absolutely worst aspect of google, and I think its something worth dumping Sundar over, in order to get in a leader that will unify goals and get people who want to make great products, not great window dressings for themselves.
This is a very valuable use case for me personally. I frequently have the problem that I change things on the filesystem in a separate tab and the agent context gets out of sync. It fails on subsequent edits, often tries to reverse the changes that I made, and many times I have to copy/paste the command I ran and it's output back into the agent window.
Your complaint is likely a product design decision rather than a engineering capacity prioritization one. As you've noted the fix is pretty trivial. I imagine that some designer or product person is intentionally holding this back for one reason or another
Claude Code has an elegant solution to the problem you mention, without trying to cram everything into a single nested pane (which feels wrong to me).
In Claude Code, when you edit a file independent from the agent, it automatically notices and says something like "I see you've made a change. Let me take a look."
I wish Gemini CLI would've taken a similar approach, since it seems to fit better with a CLI and its associated Unix philosophy.
These tools aren’t made to be used, they’re made to make the CEO look less bad by showing that “Google has these things too! We’re not falling behind! Don’t tell the board to vote to fire him!”
That’s it.
It’s not a “product”, it’s a keeping-up-with-the-Joneses checklist item.
I've had a pretty poor experience with Gemini.
I've had to convince it to do things it should just be able to do but thinks it can't for some reason. Like reading from a file outside of the project directory- it can do it fine, but refuses to unless you convince it that no it actually can.
Also has inserted "\n" instead of newlines on a number of occasions.
I'd argue these behaviors are much more important than being able to use interactive commands.
Gemini doesn't seem to be trained on tool use (which Claude is) so it quiet often thinks it can't do something it certainly can and does a lot of nonsense. For me it fails nearly everytime while it's trying to read project files because it uses relative paths instead of absolute so I've put "For your "ReadFile" and "WriteFile" tool, you MUST use absolute paths to files" in my system instructions.
Speaking of system instructions, Gemini always forgets them or doesn't follow them. And it still puts code comments nearly everywhere, it drives me nuts.
Codex is much better at following system instructions but the CLI is..... very bad.
My experience with Gemini 2.5 Pro has oddly been better, maybe because I use RooCode/Cline? It was oddly apologetic, though, wasting tokens on lamenting its failure when it fails to do something and whatnot, instead of just getting on with the solution.
At the same time, even the big versions of Qwen3 Coder (480B) regularly mess up file paths and use the wrong path separators, leading to files like srccomponentsMyComponent.vue from being created instead of src/components/MyComponent.vue.
> And it still puts code comments nearly everywhere, it drives me nuts.
I’ve had the issue of various models sometimes inserting comments like “// removed Foo” when it makes no sense to indicate the absence of something that’s not necessary there for a code block that isn’t there.
At the same time, sometimes the LLMs love to eat my comments when doing changes and leave behind only the code.
How silly (and annoying). It’s good to be able to try out multiple models with the exact same prompts though, maybe I should create my own custom mode for RooCode with all of the important stuff I want baked in.
> Speaking of system instructions, Gemini always forgets them or doesn't follow them. And it still puts code comments nearly everywhere, it drives me nuts.
Yup, I've tried to use Gemini so many times, but the lack of being able to strictly follow system prompts makes it so hard to get useful stuff out of it that doesn't need to be cleaned out. Code comments is short of impossible to get rid of, they must have trained it with only code that has comments, because the model really likes to add them everywhere.
Every agent+model combination has issues right now, I'm personally swapping between them depending on the task.
Gemini is great for stuff you need fast and don't care about the quality, as you can just throw it away.
Claude Code + Sonnet is great in many ways and follows prompts way better, but has a tendency to go off on tangents and really get lost in the woods. It requires handholding and basically interrupt it as soon as you see something weird, to steer it in the right direction. Complex stuff has to be aggressively split into smaller validated sub-tasks manually. Tends to also stop continuing by itself to say "Well, we've done half now, you want me to continue with the other half?"
Codex + GPT-5is the best at following prompts, produces the highest quality code, but is way slower than others, and still struggles with seemingly arbitrary stuff yet able to solve complex tasks by itself without any hand-holding. It can get stuck on something obvious, but at least it won't run off on it's own and it'll complete everything as well as it can, even if it takes 30 minutes.
Qwen Coder seems outright unusable and haven't been able to use it for anything good at all.
Tried AMP for a while as well, nice UI and model seems good, but too expensive (and I say this as someone who currently gives $200/month to OpenAI).
Codex doesn’t give feedback while it’s running. It just works quietly in a way that’s not easy to interrupt if you could see it going off the rails.
Claude is better at this.
Set these in the config.toml for codex and you'll get a lot more info while it's running:
Gemini seems to have a poor model of both what it can and what it is allowed to do.
I’ve noticed the latter with several image generation refusals I could eventually easily talk them out of (usually by mentioning fair use in a copyright/trademark context).
> Gemini seems to have a poor model of both what it can and what it is allowed to do.
Starting to feel like LLMs models are more of a representation of the culture of the company training them, than a fair representation of the world at large.
ConwAI’s law?
Well, those are problems with the underlying Gemini models. It's not like the team responsible for CLI could have trained a better model instead of making this feature.
Gemini 3.0 is likely to be released soon, and likely they would improve agentic coding experience.
All LLMs and agents have stupid issues like this.
GPT-5 insisted on using bash commands to edit a file, despite the dedicated tool for doing this. Problem was that the bash tool it used wrapped at 80 chars, splitting some strings between lines, which then broke the code at a syntax level. It was never able to recover, I was not impressed with GPT-5
Gemini CLI is definitely a much worse client than some of the other agent clients like opencode, cursor etc. But from my experience, that isn't because of the model quality. I get better quality responses from the gemini web chat interface than chatgpt, claude etc.
Of course my experience is anecdotal, but we hardly have any decent benchmarks to compare these models. I suspect most benchmarks have leaked into training sets, rendering them useless anyway.
I agree, gemini pro is a great model for coding if you don't need to do agentic work. I've found that it's a lot less "wordy" when editing, debugging, reviewing, etc. It gets to the point whereas other models can provide long useless explanations. It's also very smart and great with long context.
Also people don't talk enough about (or are bad at separating themselves) the model vs. the client tool - e.g. from your comment maybe using codex/Claude Code/aider with Gemini API would be better, best even, but people rarely make that comparison or separation, it's always 'Claude Code with Claude vs. codex with GPT-x' etc.
Yeah. The client tool does make a difference. For example opencode, if I am correct, just spins up its own language servers and then feeds the language server errors back into the model, resulting in a much better agentic coding experience. I don't think they are doing anything much more complex than that.
Unfortunately, nearly all the foundation model companies are just wasting their efforts on the clients, which are kind of ass, instead of focusing on the model.
Google would be much better off if they ditch their dogshit cli, and allow us to have the generous quota login off any client.
To be fair, most of the times, the tools works best with the models trained with those tools in mind, and vice-versa.
Not to mention not all models/inference works the same way so you can't really replicate the same experience. For example, new Harmony format means you can now inject messages while GPT-OSS is running inference, but obviously Claude Code don't support that because their models don't support that.
>most of the times, the tools works best with the models trained with those tools in mind
This is a garbage state of affairs though
What do you expect? People building software using other models than they themselves develop? Or people training the models train them for software that isn't the software they develop themselves?
It's like saying official car repair shops should repair any type of car, not just their brand. That's just not how the real world works.
I have had these exact issues a lot with codex (gpt-5-codex)
I second this man’s take. I’ve been using it consistently for a few months to give it a try and is definitely subpar. It can give really good answers at times however isn’t worth the time, energy, or luck to get it there.
I'd prefer that the Gemini cli run inside a text file, rather than this way round.
And really, ctrl-f? Do these devs not use the terminal at all?
As someone who has been experimenting with AI ‑powered command‑line helpers, I think adding interactive commands to the Gemini CLI is a logical step, but it won’t be useful unless the underlying model is reliable for basic tasks. Several people here noted that Gemini sometimes refuses to read files outside the project directory or mishandles newlines; those sorts of inconsistencies undermine trust.
In a world where you have 100 options, trust is of utmost importance. The CLI’s integration with node‑pty and the ability to stream pseudo‑tty output into mini‑terminal viewports is clever, and I’d love to see that layer documented or open‑sourced so other tools can build on it. I see this feature as something you’d use for short‑lived tasks like running a quick script, checking a log, or doing a one‑off database query. For longer editing sessions I’d still use a real terminal multiplexer and editor. If Google can fix the reliability issues and make the API for interactive sessions open, that would be hella good for everyone!
I made a mcp that would use a pty lib to allow claude to debug a TUI app I was writing with ok-ish results. ultimately I wanted to see what was happening myself so when I need interactive I just tell it to use tmux-cli to capture the neighboring pane. https://github.com/pchalasani/claude-code-tools/blob/main/do... maybe turning that into a mcp with more guardrails and integrated guide to the agent would make it more popwerful
I'm not actually sure about that (that turning it into an MCP would help). I've seen more momentum building around having better cli tool integration with ClaudeCode than MCP reliance.
I couldn't tell from the post how this will affect Gemini's ability to assist better as a result.
I guess for Google this will be a treasure trove of real developer interactions to train on.
I might try this once Gemini 3 comes out. Until then, if you're running tmux or zellij, this seems like a worse user experience since you're in a subwindow and have less screen real estate to work with.
The best thing about this is that now Claude and Codex have to add it.
I’m still waiting for Gemini to add hooks and sub-agents
They will do it with needlessly complexity that is out of step with the competition, as they did with slash commands (toml) and extensions (skills-equivalent).
The posted link says: >The new interactive shell is enabled by default in Gemini CLI as of v0.9.0.
but https://geminicli.com/docs/tools/shell/#enabling-interactive... > To enable interactive commands, you need to set the tools.shell.enableInteractiveShell setting to true.
Seems contradictory. I can't get it to work in either case .
I think that this feature might have taken Gemini CLI from just Temu Claude Code with higher usage limits, to actually competitive as a tool. It'll be interesting to see how well this actually works in practice.
Idk, the more skilled I get with Claude Code, the less I use interactive workflows.
I tend to agree but there are a few scenarios where I really want it to work. Debuggers in particular seem hard to get right for the current agents. I’ve not been able to get the various MCP servers I’ve tried to work, I’ve struck out using the debug adapter protocol from agent-authored python. The best results I’ve gotten are from prompting it to run the debugger under screen, but it takes many tool calls to iterate IME. I’m curious to see how gemini cli works for that use case with this feature.
I would love to use gdb through an agent instead of directly. I spend so much time looking up commands and I sometimes skip things because I get impatient stepping over the next thing
i gave up on gemini cli.
1 in 3 times I used it in past 2 months, it failed for really odd reasons, sometimes the node app just exception quit, sometimes gemini stuck and blame itself and gave up. same task I throw to cc and codex, they nailed without a blink...
Does anyone know / care to speculate how they actually make this work, in terms of the LLM call loop? Specifically: does it call back to the LLM after each keystroke sending it the new state of the interactive tool, or does it batch keystrokes up? If the former, isn’t that very slow? If the latter, won’t that cause it to make mistakes with a tool it hasn’t used before?
I think this is the PR that implemented the feature: https://github.com/google-gemini/gemini-cli/pull/6694
> feat(shell): enable interactive commands with virtual terminal
> It's not just a stream of text; it's a live feed.
LLM wrote this article it seems.
For me Gemini CLI is not as good as Claude Code and sometimes writes more code than necessary and makes it hard to maintain. but hope it gets there with gemini 3.0 release. It's open source so I can imagine it getting there faster with community contributions.
I stopped reading at that point, it was a signal that I’d just be reading another several paragraphs of repetitive prose with random bolded text. It also put such strange over emphasis on an implementation detail that is pretty much irrelevant to users which made it actively distracting on top of being an obvious LLMism.
Building an interactive shell inside their CLI seems like a very odd technical solution. I can’t think of any use case where the same context gathering couldn’t be gleaned by examining the file/system state after the session ended, but maybe I’m missing something.
On the other hand, now that I’ve read this, I can see how having some hooks between the code agent CLIs and ghostty/etc could be extremely powerful.
LLMs in general struggles with numbers, it's easy to tell with the medium sized models that struggle with line replacement commands where it has to count, it usually takes a couple of tries to get right.
I always imagined they'd have an easier time if they could start a vim instance and send search/movement/insert commands instead, not having to keep track of numbers and do calculations, but instead visually inspect the right thing happening.
I haven't tried this new feature yet, but that was the first thing that came to mind when seeing it, it might be easier for LLMs to do edits this way.
Gotta be better than codex literally writing a python script to edit a file multiple times in a single prompt response.
Personally haven't had that happen to me, been using Codex (and lots of other agents) for months now. Anecdote, but still. I wrote up a summary of how I see the current difference between the agents right now: https://news.ycombinator.com/item?id=45680796
It's nice that they mention node-pty that does most of the heavy lifting for the terminal/pseudo-tty that powers this (VSCode's terminal emulator is powered by the same library).
It looks like they've added a layer on top of node-pty to allow serializing/streaming of the contents to the terminal within the mini-terminal viewports they're allocating for the terminal rendering. I wonder if they're releasing that portion as open source?
From the blog " Gemini CLI spawns a new process within a pseudo-terminal in the background, leveraging the node-pty library...So how does this virtual terminal running in the background show up on your screen? Think of it like a video stream. Our new serializer takes a snapshot of the pseudo terminal at every moment—capturing every piece of text, every color, and even the cursor's position. These snapshots are then streamed to you, allowing you to see and interact with the terminal application in real-time. It's not just a stream of text; it's a live feed."
Terminal serializer code: https://github.com/google-gemini/gemini-cli/blob/main/packag...
Uses @xterm/headless npm package.
Your link 404s
Thanks. Fixed it.
<rant> How many people are running LLMs CLIs instead of using their APIs? It seems so obnoxious to me that using a CLI command is cheaper than using their APIs, hence forcing them to build these kind of work arounds.
Maybe I'm not getting it right, but it seems there are two competing paradigms which certainly with llms coding for llms, who cares. </rant>
Trying to use Gemini CLI is one of the most frustrating experiences with any tool I've had in over two decades of working with software.
It's seemingly very hard to understand how it should be configured at all if you don't have a personal Google account. Rather than just using your credentials to login and start, you need to find some forum posts of people that have reversed engineered that you need to use a Google Cloud environment variable, even if you are operating without a "Code Assist License" on a Google Business account.
No matter what I do on my paid subscription through Google Business with a Google Cloud project provided in the environment configured, which I had to explicitly set up just to test the CLI even though I have access to the Models through my subscription and AI Studio, I always get error 429 after one to five messages. The limits that Google claim on Gemini seem to be just a fraction of what is claimed in my case, No clearly stated reason as to why, not in the cloud console and not when using the tool itself, except for the HTTP error message.
These are not big prompts or anything of that nature. It's simple things like review a readme file or double check a single file for errors. It's been like this from the very beginning.
Even now just to verify it, I havent used Gemini for over a week, I ask it to review 3 files that are in git diff, the files are between 50-100 lines long, after checking the first file it's already on 429, on a PAID subscription, and it even states "99%" context left. So my paid subscription lets me use less than 1% of the context window and I get locked out for a unknown amount of time.
Contrasting this to both Codex and Claue Code, where you just log in and go, it's really a night and day difference. The user experience of the paid version of Gemini CLI is just utterly terrible.
Popping into nvim to check on something really quick seems immediately useful. I think I'll still want a dedicated tab or different terminal app to have my longer lived editor open but this might be nice for validating output with test runners or checking on a database entry in psql or something.
I'm not sure how usable neovim will be in what looks to be a 6 line high window as they show in the demo video.
Pretty sure you can expand it full-screen there.
I had problems running Emacs in no window mode (emacs -nw). I should try again, or maybe just use vim.
Well, I'm glad that's pretty accessible. My screen reader says "focused" when I hit Control + F and I can ask it about top output for example.
It was very buggy for me. You kind of have to coax it into interactive use and then some of the time it got stuck pondering once I exited the app flow and returned to the Gemini CLI (not with Ctrl-F, full exit, it closes the TUI window). It's also super laggy.
To be honest, at this point having Claude Code monitor the output of a `tmux pipe-pane` is probably going to be superior.
To be clear, the LLM is only aware of the final state of the ptty when the command exits, right? It's not a TUI computer-use model at this point from what I can tell.
Look at me, you are the model now
Aside: The demo shows git commands being run in the CLI. I absolutely hate it when devs use a commit message that says "chore: my first commit from gemini cli" - I get that it's meant for the demo, but in general too, I've seen codebases that enforce these commit prefixes such as "chore", "feat", "bugfix" etc. Is there any real value to that? Besides wasting up the 50 character limit on the first line of the commit message, I don't see anything else being done including those. Also, non-imperative commit messages?! Come on, guys!
If you manage a product that releases changelogs then by tagging commits that way you can automatically group changes into headers like that when generating your changelog from your git history. It's fairly common in open source projects. If you however are working on some internal stuff at a company, and you don't generate changelogs from your commits then doing conventional commits isn't that useful.
If you're looking in the commit tree for which commit fixed a certain bug, but didn't fix it fully , for example , you first look at all the `fix:` and then if it matches, you read the rest. You just write `fix: Thumbnail wasn't updating after upload` to `Fix for Thumbnail not updating after upload`, which isn't really wasting characters.
But I'm also not a fan of this being an enforced convention because somebody higher up decided he/she it brings some value and now it's the 101st convention a new dev has to follow which actually reduces productivity.
> I've seen codebases that enforce these commit prefixes such as "chore", "feat", "bugfix" etc. Is there any real value to that?
It's a choice some teams make, presumably because _they_ see value in it (or at least think they will). The team I'm on has particular practices which I'm sure would not work on other teams, and might cause you to look at them with the same incredulity, but they work for us.
For what it's worth, the prefixes you use as examples do arise from a convention with an actual spec:
https://www.conventionalcommits.org/en/v1.0.0/
Just because someone put up a fancy website and named it "conventional" doesn't mean it's a convention or that it's a good idea.
The main reason this exists is because Angular was doing it to generate their changelogs from it. Which makes sense, but outside of that context it doesn't feel fully baked.
I usually see junior devs make such commits, but at the same time they leave the actual commit message body completely empty and don't even include a reference to the ticket they're working on.
i’ve had little luck getting ai systems to correctly set up networking for a set of vms. they tend to go round and round with ip tables commands that don’t ultimately solve the problem. is config fundamentally harder than writing code ?
Did you give them a way to check the networking rules?
If not, the model is just shooting in the dark and guessing.
i give feedback by copy-pasting output. hence the round and round. maybe if i had a sandbox that the model could run on autonomously it might ho better/faster.
Gemini is comically bad, like so bad you wonder if the product managers even know what it is supposed to sound/look like when working with an LLM.
What the heck is going on in Google-land?
How much tokens does it eat up? Does the context stay concise? Who owns these "serializations" that's uploaded to google all the time?