I have an old Django site I'm maintaining for a long-time customer of mine. They often want to make small changes - things that are only a few lines of code, but would take an hour to just spin up the system, remind myself how it works, commit, push, update the server and all that.
Last week I've moved the whole infrastructure to Railway, and taught the customer to use Jules. They make their own PRs now, and Railway spins up an environment with the changes, so the customer can check it themselves. It works like 75% of the time, and when it doesn't, the customer see that it doesn't before it even reaches me. Only if they're happy with the changes, I step in to review the code and press merge. It's been a such a huge time saver so far.
I can't speak for the OP, but I have customers I still support, because they supported me many years ago when I was a teenager taking my first steps into industry.
Does it make me money? Barely a cent. But I can spare a hour or two a year for the guy who gave me a leg up and trusted a teenager who probably shouldn't have been trusted. And I like the feeling of having something I worked on still going strong 20+ years later, when so much of my later work has been thrown away by the endless corporate rewrite treadmill.
How expensive are the API charges? Seems like it might be a bit too easy for a customer to rack up a big bill testing out minor changes if things weren't configured correctly.
Literally free. No API - the reason I went for Jules instead of Claude Code / Gemini CLI for example is specifically because of it's relatively polished web-interface, which I assumed that my customer would appreciate. They're using their own Google account and the daily tasks free limit seem to be more than enough for them.
There is a free plan with 15 tasks/sessions. It doesn’t count tokens AFAIK. There would obviously be a runtime limit of some sorts for sure. But it’s not the same as API keys and token situation
The free tier is 15 tasks per day (of gemini-2.5-pro) which is EXTREMELY generous. I've had plenty of tasks run for 1-2 hours. I do think that after 1 or 2 hours it's told it needs to wrap up and just present what it's done; I couldn't get it to keep going longer than 2 hours. But Jules is very slow as it seems to be batch processing on spare capacity, so 15+ hours a day is not quite as absurd as it sounds.
I haven't tried Jules in a couple weeks, but the UI/UX had a lot of issues such as not being given any progress updates for very long times. The worst thing was not being able to see what it was doing and correct it: you only see the state of files (without a usable diff viewer, WTF) at the last point that the agent decided to show you anything (the last time it completed a todo list item I think, and I couldn't get it to update the state when asked, though it will send a PR if you ask), and gemini-2.5-pro can often try really stupid things as it tries to debug. I've also been impressed at its debugging abilities a number of times.
Still, I found Jules far more usable than Gemini CLI (free tier), where Gemini just constantly stops for no reason and needs to be told to continue, and I exhausted the usage limit in minutes.
Aside from the unlimited free tier, probably the best part of Jules are its automated code reviews. Once, I was writing up some extensive comments on its code and then unexpectedly a code review was dropped in the conversation which gave exactly the same feedback I was writing. Unfortunately if it never reaches the point of submitting for review, it doesn't get an automated review. It does often ask for feedback before it's done, which is nice. So probably I needed to prompt better.
That's honestly incredibly cool, could I perhaps encourage you to write a blog about the details with some examples on what the PR requests from your customer looks like.
I believe it is to make sure that the product remains compliant with the data guarantees that Workspace provides. You aren't paying for the latest and the greatest features, you're paying for the support and compliance guarantees your business expects.
companies want features gated behind controls, they want audit trails, compliance, SLAs, integration with their admin consoles. and they want some certainty that the feature won't change too quickly.
i will never understand why people keep using workspace accounts for personal use, and then being surprised when features hit those accounts more slowly. this is how it's worked for 20 years, it's not going to change. if you want earlier access, create a gmail account for your personal use.
My experience with coding agents leads me to believe using something like this will end up being more noise and work than ROI
I think that depends on how far out your horizon is. If you're only looking one task out, or maybe a few weeks out, then it's not worth investing the time yet. On the other hand, if you're looking at how your engineering team will work in 3 years time it's definitely worth starting to look at it now.
An example that comes to mind: having a bot that automatically spins up an environment when a library is updated, runs through the tests, and identifies why a codebase doesn't work with the update, and fixes it then opens an appropriate PR that passes all the tests for humans to review would be incredibly useful.
The LLMs are a crapshoot, and probably always will be, for reliable automatic fixing of anything. They save me time 50% of the time. The other 50% they just can’t put enough together to grok what the existing code does, but damn if their code doesn’t look like it should work.
In 3 years time this won't be how these tools work. So it feels like you're saying we should invest time in something that doesn't work and will be redundant in 6 months.
Worse, your example is one that AI agents are notoriously bad at. Give them an error like that and they're more liable to break the existing functionality to fix it, after littering the code with tons of log statements.
When most of the time the actual fix is a silly little mistake that takes one line to fix.
We are unlikely to see these things progress beyond "a never learning junior that doesn't remember what it did last hour"
It's a limitation inherited from how they are designed. Fine if you babysit them, but they quickly get off the rails and waste my time too. Hence to original question about people actually using something like Jules versus speculating how nice it would be
> why would I want an external tool over an integration?
I do not feel comfortable running agents the same computer I have my photos, email, browser cookies, etc. on my personal computer, so giving Jules access to my GitHub project was a nice experience for me. It was able read my Gemfile and run my Rails app's test suite without me having to worry about all my private data on my machine. The code wasn't great, but it did help with coders block to kick off some features.
The benefit I've found of external vs integration at least with GitHub copilot is in the cloud it auto approves by default and it's working in a sandboxed environment.
I think they are doing both (in true Google fashion), there is an open source Gemini cli with a generous free tier that more directly competes with Claude code.
https://github.com/google-gemini/gemini-cli
It was pretty rough at launch but has gotten a lot better. So has Claude code though, so I’ve never really switched over.
I've been using AI coding agents since the very early days of Aider and I think this is not quite true.
There's a place for async agents. There's a place for collaborative agents. Collaborative agents may even soon be delegating off to multiple async agents and picking best results. There's so much complexity here and we haven't even begun to explore a corner of the possible design space. We're still trying to plug AIs into human-shaped holes instead of building around their interesting/weird capabilities.
Would you be willing to point me to a primer of how I can get started with building agents?
This week I experimented with building a simple planner/reviewer “agentic” iterative pipeline to automate an analysis workflow.
It was effectively me dipping my toes into this field, and I am so floored to learn more. But I’m unsure of where to start, since everything seems so fast paced.
I’m also unsure of how to experiment, since APIs rack up fees pretty quickly. Maybe local models?
There are a number of free and cheap LLM options to experiment with. Google offers a decent free plan for Gemini (get some extra Google accounts). Groq has a free tier including some good open weight models. There's also free endpoints on OpenRouter that are limited but might be useful for long running background agents. DeepSeek v3.2, Qwen3, Kimi K2, and GLM 4.6 are all good choices for cheap and capable models.
Local models are generally not a shortcut to cheap and effective AI. It's a fun thing to explore though.
I am so sick of these anthropomorphized names that have nothing to do with anything that we’re all supposed to remember now. Why are we giving products first names? The worst offender is probably Amazon Rufus. It’s all so dumb and I hate it. At least attempt to be clever and name it something that relates to the product itself. Even Google Wave, despite its shortcomings, made sense as a product name.
My friend, letting yourself be bothered by this is just pissing into the wind. Humans have been anthropomorphizing machines and other objects for as long as we've been making them, it's a fundamental aspect of human nature. Thousands upon thousands of ships and trains given human names. Tanks, guns, cars, anything that is at least moderately complex or that people find themselves relying on and forming relationships with. AIs have been getting human names since at least 1966 with Eliza, probably earlier, and certainly with many earlier examples in fiction.
Would anyone at Google be willing to tell me how many people are working on this project? I’ve been building something functionally similar for my employer, but it’s a nights and weekends project with only one contributor (me).
Jules can add all it wants and I will still not use it simply because it's a Google product and Google doesn't know how to make products in the past 20 years.
Also, why the heck are Google's offerings so fragmented?! We have `gemini`, `jules`, and we also have two sets of different Gemini APIs (one is more limited than the other), and no API is entirely OpenAI-compatible.
I really hope Google discontinues this project soon (that’s kind of their specialty). I find it frustrating when chatbots/LLMs adopt real names as their brand identities.
I have an old Django site I'm maintaining for a long-time customer of mine. They often want to make small changes - things that are only a few lines of code, but would take an hour to just spin up the system, remind myself how it works, commit, push, update the server and all that.
Last week I've moved the whole infrastructure to Railway, and taught the customer to use Jules. They make their own PRs now, and Railway spins up an environment with the changes, so the customer can check it themselves. It works like 75% of the time, and when it doesn't, the customer see that it doesn't before it even reaches me. Only if they're happy with the changes, I step in to review the code and press merge. It's been a such a huge time saver so far.
Do they still pay you the same amount?
I can't speak for the OP, but I have customers I still support, because they supported me many years ago when I was a teenager taking my first steps into industry.
Does it make me money? Barely a cent. But I can spare a hour or two a year for the guy who gave me a leg up and trusted a teenager who probably shouldn't have been trusted. And I like the feeling of having something I worked on still going strong 20+ years later, when so much of my later work has been thrown away by the endless corporate rewrite treadmill.
Same situation, 10+ yrs deep with my first client, project still chugging along while I tackle bigger fish.
Can't justify spending much time on it now but a DIY no/low code solution for them isn't a bad idea.
I think that he meant in the sense that now AI it's making all the changes
How expensive are the API charges? Seems like it might be a bit too easy for a customer to rack up a big bill testing out minor changes if things weren't configured correctly.
Literally free. No API - the reason I went for Jules instead of Claude Code / Gemini CLI for example is specifically because of it's relatively polished web-interface, which I assumed that my customer would appreciate. They're using their own Google account and the daily tasks free limit seem to be more than enough for them.
There is a free plan with 15 tasks/sessions. It doesn’t count tokens AFAIK. There would obviously be a runtime limit of some sorts for sure. But it’s not the same as API keys and token situation
The free tier is 15 tasks per day (of gemini-2.5-pro) which is EXTREMELY generous. I've had plenty of tasks run for 1-2 hours. I do think that after 1 or 2 hours it's told it needs to wrap up and just present what it's done; I couldn't get it to keep going longer than 2 hours. But Jules is very slow as it seems to be batch processing on spare capacity, so 15+ hours a day is not quite as absurd as it sounds.
I haven't tried Jules in a couple weeks, but the UI/UX had a lot of issues such as not being given any progress updates for very long times. The worst thing was not being able to see what it was doing and correct it: you only see the state of files (without a usable diff viewer, WTF) at the last point that the agent decided to show you anything (the last time it completed a todo list item I think, and I couldn't get it to update the state when asked, though it will send a PR if you ask), and gemini-2.5-pro can often try really stupid things as it tries to debug. I've also been impressed at its debugging abilities a number of times.
Still, I found Jules far more usable than Gemini CLI (free tier), where Gemini just constantly stops for no reason and needs to be told to continue, and I exhausted the usage limit in minutes.
Aside from the unlimited free tier, probably the best part of Jules are its automated code reviews. Once, I was writing up some extensive comments on its code and then unexpectedly a code review was dropped in the conversation which gave exactly the same feedback I was writing. Unfortunately if it never reaches the point of submitting for review, it doesn't get an automated review. It does often ask for feedback before it's done, which is nice. So probably I needed to prompt better.
> I've had plenty of tasks run for 1-2 hours.
I think they throttle it - they note it is an asynchronous service
I agree that is is generally a pretty useful service.
I wonder if on Google's end it's basically a low-priority job that runs whenever a region has idle GPUs.
How do you handle the customer database? Do you push this in its entirety to the VM?
I hope they don't store any user data in their app. Trusting LLMs blindly is a bad idea.
There is a human being (GP) reviewing the proposed code before merging. I wouldn't describe that as trusting the LLM blindly.
No, there is not
Yes, there is. From the OP:
"Only if they're happy with the changes, I step in to review the code and press merge."
Ok, thanks, I misunderstood that.
So presumably it spins up a review app from the PR for the customer to review, really smart actually.
Jules has access to the codebase, not the database. It doesn't see any user data.
I was talking about potential security problems introduced in the code by LLMs.
It's pretty easy to introduce something like IDOR when asking LLMs to write the code.
I review the PRs Jules makes just like I review any PR.
This is the original poster, you downvoters. I think we can assume he knows what he gave access to.
That's honestly incredibly cool, could I perhaps encourage you to write a blog about the details with some examples on what the PR requests from your customer looks like.
> Support for workspace users is coming later in October!
I'll never understand why paying users are so often left behind. It's truly bizarre.
I believe it is to make sure that the product remains compliant with the data guarantees that Workspace provides. You aren't paying for the latest and the greatest features, you're paying for the support and compliance guarantees your business expects.
because paying users don't want features quickly.
companies want features gated behind controls, they want audit trails, compliance, SLAs, integration with their admin consoles. and they want some certainty that the feature won't change too quickly.
i will never understand why people keep using workspace accounts for personal use, and then being surprised when features hit those accounts more slowly. this is how it's worked for 20 years, it's not going to change. if you want earlier access, create a gmail account for your personal use.
Was able to build a personal MCP server that connects to the Jules API, letting me dispatch tasks to Jules, from Copilot Chat in VS Code.
Video here: https://www.youtube.com/watch?v=RIjz9w77h1Q
If you have copilot already, just use the copilot coding agent it does the same and it's much better.
From my experience Jules is the worst coding agent on the market.
Do people trust these kinds of things to effectively work async and unsupervised?
My experience with coding agents leads me to believe using something like this will end up being more noise and work than ROI
I suppose it could be effectively the same loop I use in VS Code, but then why would I want an external tool over an integration?
My experience with coding agents leads me to believe using something like this will end up being more noise and work than ROI
I think that depends on how far out your horizon is. If you're only looking one task out, or maybe a few weeks out, then it's not worth investing the time yet. On the other hand, if you're looking at how your engineering team will work in 3 years time it's definitely worth starting to look at it now.
An example that comes to mind: having a bot that automatically spins up an environment when a library is updated, runs through the tests, and identifies why a codebase doesn't work with the update, and fixes it then opens an appropriate PR that passes all the tests for humans to review would be incredibly useful.
The LLMs are a crapshoot, and probably always will be, for reliable automatic fixing of anything. They save me time 50% of the time. The other 50% they just can’t put enough together to grok what the existing code does, but damn if their code doesn’t look like it should work.
Why?
In 3 years time this won't be how these tools work. So it feels like you're saying we should invest time in something that doesn't work and will be redundant in 6 months.
Worse, your example is one that AI agents are notoriously bad at. Give them an error like that and they're more liable to break the existing functionality to fix it, after littering the code with tons of log statements.
When most of the time the actual fix is a silly little mistake that takes one line to fix.
And in five years time all human engineers will be replaced. Why not just quit now?
Especially odd for someone working in IT. If I'd only learn what doesn't change in a few years I'd have to find new work.
We are unlikely to see these things progress beyond "a never learning junior that doesn't remember what it did last hour"
It's a limitation inherited from how they are designed. Fine if you babysit them, but they quickly get off the rails and waste my time too. Hence to original question about people actually using something like Jules versus speculating how nice it would be
> why would I want an external tool over an integration?
I do not feel comfortable running agents the same computer I have my photos, email, browser cookies, etc. on my personal computer, so giving Jules access to my GitHub project was a nice experience for me. It was able read my Gemfile and run my Rails app's test suite without me having to worry about all my private data on my machine. The code wasn't great, but it did help with coders block to kick off some features.
The benefit I've found of external vs integration at least with GitHub copilot is in the cloud it auto approves by default and it's working in a sandboxed environment.
Yea, I get that, it is a bit more work to add the auto approve config and setup agents to run in containers yourself
Yeah, in my experience you have to babysit them
VS Code is not a coding agent as much as it is code generation and completion
Copilot in Agent mode, thought that would not need to be said given the audience and surrounding context
can we go back to have rust/go based binary instead of nodejs cli? i really find them annoying to install compared to a single binary
I find it more annoying that Discord is their preferred way to provide feedback.... This shit is banned from work.
Exposing my ignorance, how is this very different from copilot or other online coding agents?
It’s a shame Google picked the wrong system design for Jules. Claude Code’s system design is clearly superior at this point.
Jules is going to simply be another vendor locked walled garden play.
I think they are doing both (in true Google fashion), there is an open source Gemini cli with a generous free tier that more directly competes with Claude code. https://github.com/google-gemini/gemini-cli
It was pretty rough at launch but has gotten a lot better. So has Claude code though, so I’ve never really switched over.
I've been using AI coding agents since the very early days of Aider and I think this is not quite true. There's a place for async agents. There's a place for collaborative agents. Collaborative agents may even soon be delegating off to multiple async agents and picking best results. There's so much complexity here and we haven't even begun to explore a corner of the possible design space. We're still trying to plug AIs into human-shaped holes instead of building around their interesting/weird capabilities.
Would you be willing to point me to a primer of how I can get started with building agents?
This week I experimented with building a simple planner/reviewer “agentic” iterative pipeline to automate an analysis workflow.
It was effectively me dipping my toes into this field, and I am so floored to learn more. But I’m unsure of where to start, since everything seems so fast paced.
I’m also unsure of how to experiment, since APIs rack up fees pretty quickly. Maybe local models?
Personally I found the mini SWE-agent to be a very approachable introduction to building agents: https://github.com/SWE-agent/mini-swe-agent
There are a number of free and cheap LLM options to experiment with. Google offers a decent free plan for Gemini (get some extra Google accounts). Groq has a free tier including some good open weight models. There's also free endpoints on OpenRouter that are limited but might be useful for long running background agents. DeepSeek v3.2, Qwen3, Kimi K2, and GLM 4.6 are all good choices for cheap and capable models.
Local models are generally not a shortcut to cheap and effective AI. It's a fun thing to explore though.
You can directly use Claude Code via its scriptable API (things like --verbose --output-format json --input-format json --include-partial-messages
and then use your existing Anthropic plan. Otherwise yeah you'll have to start using API tokens:
https://www.anthropic.com/engineering/building-agents-with-t...
or you can pretend to be claude code in api calls
I fail to see how comparing Jules to Claude Code is relevant. They’re completely different.
A good Jules comparison would be OpenAI Codex.
For a Claude Code Google equivalent there’s Gemini Code Assist CLI
Tbf, you can install Claude Code to GitHub. Then you can mention @claude and ask it to do things.
But Jules is more sophisticated though.
Exactly. As the sibling comments point out. Async and collaborative are different ways to work. Both have its place.
Jules reminds me more of https://github.com/features/spark if comparing things
You mean giving an agent access to your user space and hoping nothing goes wrong?
The default installation for claude code is hilariously insecure and the only times I've used it is in a fully sandboxed VM.
I am so sick of these anthropomorphized names that have nothing to do with anything that we’re all supposed to remember now. Why are we giving products first names? The worst offender is probably Amazon Rufus. It’s all so dumb and I hate it. At least attempt to be clever and name it something that relates to the product itself. Even Google Wave, despite its shortcomings, made sense as a product name.
I assumed the Jules name was at least partially inspired by Jenkins.
Claude is named after Claude Shannon. And Google Wave was the future we needed, even if we did not deserve it.
My friend, letting yourself be bothered by this is just pissing into the wind. Humans have been anthropomorphizing machines and other objects for as long as we've been making them, it's a fundamental aspect of human nature. Thousands upon thousands of ships and trains given human names. Tanks, guns, cars, anything that is at least moderately complex or that people find themselves relying on and forming relationships with. AIs have been getting human names since at least 1966 with Eliza, probably earlier, and certainly with many earlier examples in fiction.
There's no stopping it. Just roll with it.
Would anyone at Google be willing to tell me how many people are working on this project? I’ve been building something functionally similar for my employer, but it’s a nights and weekends project with only one contributor (me).
Why would you build something for your employer in your personal time?
You're literally putting your own money in the shareholders pockets.
Is there any price comparison between Jules and Claude code?
Recently I moved from repl.it to Claude max to save costs.
Isn't replit more for vibe coding and Cluade Code for actual coding? They seem like very separate products.
Url with the anchor in case it moves: https://jules.google/docs/changelog/#introducing-the-jules-a...
Jules can add all it wants and I will still not use it simply because it's a Google product and Google doesn't know how to make products in the past 20 years.
Also, why the heck are Google's offerings so fragmented?! We have `gemini`, `jules`, and we also have two sets of different Gemini APIs (one is more limited than the other), and no API is entirely OpenAI-compatible.
Come on Google...
I really hope Google discontinues this project soon (that’s kind of their specialty). I find it frustrating when chatbots/LLMs adopt real names as their brand identities.