This looks interesting, but it is quite buggy. A few things I've found:
- error handling is not good. It just dumps the error JSON in the console, and doesn't read and understand it, so keeps retrying even if it's something that it clearly needs to change
- ESC is pretty unreliable at interrupting ongoing activity
- I'd expect to be able to use arrow keys to navigate through history
- is there a way to change the preference order for models?
It succeeded in my standard task of adding support for detecting itself to am-i-vibing, but it got stuck in some API errors before it was able to create a PR. It does now have support though after a little help: https://github.com/ascorbic/am-i-vibing
Ah, I actually find it useful to see the error output! But I can add a flag to hide it. Yes, you can change the ordering by going to the menu (via ESC), and going to "Settings" and then "Set default model." (You can also just edit the model order in the config file at ~/.config/octofriend/octofriend.json5).
ESC should reliably interrupt the model, although it doesn't currently interrupt tool calls — I'll add that (although all tool calls have timeouts currently).
1. Verbose errors are hidden by default. You can run octofriend with OCTO_VERBOSE=1 set in your shell's env vars to see the verbose errors.
2. Pretty much everything can be interrupted via ESC now, not just model responses. Bash commands will be killed (first via SIGINT, then via SIGTERM if they don't respond quickly), the web fetch tool will kill requests, etc.
With GLM-4.5, we can disable thinking by appending /nothink to the user message. Is that how we suppose to use Octofriend, e.g. "Plan looks good. Go ahead and implement feature X. /nothink"
Looking forward to trying this out. An early comment: I would love to be able to override tool descriptions and system prompts from a config. Especially when working with local models, context management is king and the tool descriptions can be a hidden source of uncontrollable context.
This looks very interesting. I wish it came with some guides for using it with a local LLM. I have an MBP with 128gb of ram and I have been trying to find a local open source coding agent. This feels like it could be the thing.
I'll add docs! Tl;DR: in the onboarding (or in the Add Model menu section), you can select adding a custom LLM. It'll ask you for your API base URL, which is whatever localhost+port setup you're using, and then an env var to use as an API credential. Just put in any non-empty credential, since local models typically don't actually use authentication. Then you're good to go.
IMO gpt-oss-120b is actually a very competent local coding agent — and it should fit on your 128GB Macbook Pro. I've used it while testing Octo actually, it's quite good for a local model. The best open model in my opinion is zai-org/GLM-4.5, but it probably won't fit on your machine (although it works well with APIs — my tip is to avoid OpenRouter though since quite a few of the round-robin hosts have broken implementations.)
I'm trying to set it up right now with lmstudio with qwen3-coder-30b. Hopefully it's going to work. Happy to take any pointers on anything y'all have tried that seemed particularly promising.
They're just Llama 3.1 8b Instruct LoRAs, so yes — you can run them locally! Probably the easiest way is to merge the weights, since AFAIK ollama and llama.cpp don't support LoRAs directly — although llama.cpp has utilities for doing the merge. In the settings menu or the config file you should be able to set up any API base URL + env var credential for the autofix models, just like any other model, which allows you to point to your local server :)
Lol! We're open-source, so there's no point hiding. Our actual non-devDependencies in our package.json is small, but there are a lot of transitive dependencies — downside of the Node ecosystem.
I doubt we're particularly different in that regard from Claude Code, since we use the same frameworks (e.g. Ink for terminal rendering).
OpenCode doesn't handle thinking tokens particularly well, so the LLMs are dumber (it doesn't pass the encrypted reasoning tokens back during multi-turn). Aider is very different in terms of UX.
Also, Octo has a couple of optional custom-trained ML models to autofix minor diff edit failures and JSON encoding errors. You'll notice on the Aider benchmarks that some LLMs end up failing due to edit format errors: Octo should have much fewer of those thanks to the autofix models.
It's quite similar to Claude Code. The main advantages are that it's super easy to use with different models when new ones come out (like GPT-5!), and with local LLMs, and we have some optional, custom-trained small models that help auto-fix diff edit failures and minor JSON inaccuracies — they work with any model and especially help with some of the open-source coding models.
We're working on creating an SDK that will allow other folks to build their own CLIs with OpenHands, so you can take advantage of our SOTA agent, but implement the TUI/GUI of your dreams.
wow the fake Studio Ghibli artwork is really unsettling. Seriously creepy uncanny valley vibes on top of the stolen style. I hate it. Please never do that again.
The art is definitely creepy. I don't know how an octopus (or is that a decapus?) in a jar could be considered "cute". It reminds me of Cthulhu and undesirably branchy codebases.
This looks interesting, but it is quite buggy. A few things I've found:
- error handling is not good. It just dumps the error JSON in the console, and doesn't read and understand it, so keeps retrying even if it's something that it clearly needs to change
- ESC is pretty unreliable at interrupting ongoing activity
- I'd expect to be able to use arrow keys to navigate through history
- is there a way to change the preference order for models?
It succeeded in my standard task of adding support for detecting itself to am-i-vibing, but it got stuck in some API errors before it was able to create a PR. It does now have support though after a little help: https://github.com/ascorbic/am-i-vibing
Ah, I actually find it useful to see the error output! But I can add a flag to hide it. Yes, you can change the ordering by going to the menu (via ESC), and going to "Settings" and then "Set default model." (You can also just edit the model order in the config file at ~/.config/octofriend/octofriend.json5).
ESC should reliably interrupt the model, although it doesn't currently interrupt tool calls — I'll add that (although all tool calls have timeouts currently).
It's useful, but by default it would be best to just show the error message, not a screen full of JSON from the Vercel SDK
This is very good feedback thank you :D I'll ship these improvements tonight.
FYI, I just shipped a new version that addresses:
1. Verbose errors are hidden by default. You can run octofriend with OCTO_VERBOSE=1 set in your shell's env vars to see the verbose errors.
2. Pretty much everything can be interrupted via ESC now, not just model responses. Bash commands will be killed (first via SIGINT, then via SIGTERM if they don't respond quickly), the web fetch tool will kill requests, etc.
Thanks for all the feedback :)
After reading this comment I checked the repo's tests. There were few.
With GLM-4.5, we can disable thinking by appending /nothink to the user message. Is that how we suppose to use Octofriend, e.g. "Plan looks good. Go ahead and implement feature X. /nothink"
Do you have any typical costs for various models, for a "typical" day of use? 6 hours of sitting at desk writing code (rather than code reviews etc)
Looking forward to trying this out. An early comment: I would love to be able to override tool descriptions and system prompts from a config. Especially when working with local models, context management is king and the tool descriptions can be a hidden source of uncontrollable context.
This looks very interesting. I wish it came with some guides for using it with a local LLM. I have an MBP with 128gb of ram and I have been trying to find a local open source coding agent. This feels like it could be the thing.
I'll add docs! Tl;DR: in the onboarding (or in the Add Model menu section), you can select adding a custom LLM. It'll ask you for your API base URL, which is whatever localhost+port setup you're using, and then an env var to use as an API credential. Just put in any non-empty credential, since local models typically don't actually use authentication. Then you're good to go.
IMO gpt-oss-120b is actually a very competent local coding agent — and it should fit on your 128GB Macbook Pro. I've used it while testing Octo actually, it's quite good for a local model. The best open model in my opinion is zai-org/GLM-4.5, but it probably won't fit on your machine (although it works well with APIs — my tip is to avoid OpenRouter though since quite a few of the round-robin hosts have broken implementations.)
Ok wonderful! Thanks.
I'm trying to set it up right now with lmstudio with qwen3-coder-30b. Hopefully it's going to work. Happy to take any pointers on anything y'all have tried that seemed particularly promising.
For sure! We also have a Discord server if you need any help: https://discord.gg/syntheticlab
Follow up question, can the diff apply and fix json models be run locally as well with octofriend, or do they have to hit your servers? Thanks!
They're just Llama 3.1 8b Instruct LoRAs, so yes — you can run them locally! Probably the easiest way is to merge the weights, since AFAIK ollama and llama.cpp don't support LoRAs directly — although llama.cpp has utilities for doing the merge. In the settings menu or the config file you should be able to set up any API base URL + env var credential for the autofix models, just like any other model, which allows you to point to your local server :)
The weights are here:
https://huggingface.co/syntheticlab/diff-apply
https://huggingface.co/syntheticlab/fix-json
And if you're curious about how they're trained (or want to train your own), the entire training pipeline is in the Octofriend repo.
I think this might be your best bet right now. GLM-4.5-Air is probably next best. I'd run them at 8-bit using MLX.
https://deps.dev/npm/octofriend/0.0.18/dependencies - huge list of dependencies
They should hide them, like Anthropic does, to confuse dependency hawks.
https://deps.dev/npm/%40anthropic-ai%2Fclaude-code/1.0.69/de...
Lol! We're open-source, so there's no point hiding. Our actual non-devDependencies in our package.json is small, but there are a lot of transitive dependencies — downside of the Node ecosystem.
I doubt we're particularly different in that regard from Claude Code, since we use the same frameworks (e.g. Ink for terminal rendering).
There are only 16 direct dependencies, and they all look pretty reasonable to me.
Have you worked with any Node.js projects before? I'd actually say this is a relatively sparse list of dependencies for a user-facing tool.
How does it differ from existing ones like Aider and OpenCode, besides being less mature?
OpenCode doesn't handle thinking tokens particularly well, so the LLMs are dumber (it doesn't pass the encrypted reasoning tokens back during multi-turn). Aider is very different in terms of UX.
Also, Octo has a couple of optional custom-trained ML models to autofix minor diff edit failures and JSON encoding errors. You'll notice on the Aider benchmarks that some LLMs end up failing due to edit format errors: Octo should have much fewer of those thanks to the autofix models.
Looks interesting! How would you say it compares against Claude/Gemini code or any of the other major terminal-based coding assistants?
It's quite similar to Claude Code. The main advantages are that it's super easy to use with different models when new ones come out (like GPT-5!), and with local LLMs, and we have some optional, custom-trained small models that help auto-fix diff edit failures and minor JSON inaccuracies — they work with any model and especially help with some of the open-source coding models.
We also open-sourced the autofix models:
https://huggingface.co/syntheticlab/diff-apply
https://huggingface.co/syntheticlab/fix-json
They're truly open source, not just open weight BTW: the entire training pipeline is in the Octofriend repo.
Quick plug for the OpenHands CLI: https://docs.all-hands.dev/usage/how-to/cli-mode
We're working on creating an SDK that will allow other folks to build their own CLIs with OpenHands, so you can take advantage of our SOTA agent, but implement the TUI/GUI of your dreams.
The main problem for me is that afaik the only agent that works with Claude Max subscription is Claude Code.
I believe opencode also uses Claude Max somehow - https://github.com/sst/opencode
wow the fake Studio Ghibli artwork is really unsettling. Seriously creepy uncanny valley vibes on top of the stolen style. I hate it. Please never do that again.
The art is definitely creepy. I don't know how an octopus (or is that a decapus?) in a jar could be considered "cute". It reminds me of Cthulhu and undesirably branchy codebases.