I've been using Qwen3.6-Plus in my carpentry-simulator agent harness. That model performs pretty well, but not at Opus 4.7 levels. I will try 3.7 Plus when the pricing is announced and report back.
I use Qwen3.6-A3B locally on my Strix halo for testing simple prompts like "make the largest polygonal picture frames you can, each from a single 8 foot 2x4. Start with 3 sided and go up from there, lay them all out in a 9 by 9 grid."
It's a cheap (off-grid solar power, so ~free) way to sharpen the harness: if less-than-SOTA models can succeed then the tool surface is clear. Bumping up to 3.6 Pro has been a nice middle step closer to the frontier models without spending an arm and a leg on credits.
It's a woodworking simulator. There are tools (tape measure, pencil, miter saw, table saw, jigsaw, drill, router) and wood stocks (2x4s, plywood).
There is a chat window and human UI similar to SketchUp. You can task the agent with using the tools and assembling the project, or do it yourself. It outputs real cad files, plans, and a build guide video.
Qwen 3.6 has been great because it's multimodal and good at tool calling. As it builds it can get screenshots and all that. The basic output is an operation list with real measurements and bevel/angle settings and the like.
Let me share a couple demos, with the caveat that this project is very much still a beta. There's a bug report/feature request button in-app, so if you notice anything egregious feel free to reach out there.
[0] Building a hubless bucky ball dome. It's a bit deceptively-simple, because the agent only has to call the procedure with the right values. That's an intended flywheel: agents and users can save procedures to a community library, and future agents can discover and invoke the pre-existing procedures, instead of re-deriving them. They can copy and edit the procedure as needed, so it basically becomes a RAG library for carpentry procedures written in the project's DSL.
[1] A sawhorse, re-implemented by Opus 4.8 from plans published by Home Depot. Now saved as a procedure so you can drop a sawhorse anywhere you need one.
[2] A small shed with stick frame walls and a lean-to roof.
[3] The Picture frame prompt mentioned in my first comment.
Quick note: I recently broke the smart de-duplication for step-by-step playback, so it will show every single cut, one by one. Hit "stop" and then click the final frame in the timeline to see just the finished state.
The interesting design question here is whether unifying GUI and CLI operation in a single agent loop actually improves performance or just makes the benchmark story cleaner.
I've been using Qwen3.6-Plus in my carpentry-simulator agent harness. That model performs pretty well, but not at Opus 4.7 levels. I will try 3.7 Plus when the pricing is announced and report back.
I use Qwen3.6-A3B locally on my Strix halo for testing simple prompts like "make the largest polygonal picture frames you can, each from a single 8 foot 2x4. Start with 3 sided and go up from there, lay them all out in a 9 by 9 grid."
It's a cheap (off-grid solar power, so ~free) way to sharpen the harness: if less-than-SOTA models can succeed then the tool surface is clear. Bumping up to 3.6 Pro has been a nice middle step closer to the frontier models without spending an arm and a leg on credits.
Can you elaborate on what your agent harness entails?
It's a woodworking simulator. There are tools (tape measure, pencil, miter saw, table saw, jigsaw, drill, router) and wood stocks (2x4s, plywood).
There is a chat window and human UI similar to SketchUp. You can task the agent with using the tools and assembling the project, or do it yourself. It outputs real cad files, plans, and a build guide video.
Qwen 3.6 has been great because it's multimodal and good at tool calling. As it builds it can get screenshots and all that. The basic output is an operation list with real measurements and bevel/angle settings and the like.
Let me share a couple demos, with the caveat that this project is very much still a beta. There's a bug report/feature request button in-app, so if you notice anything egregious feel free to reach out there.
[0] Building a hubless bucky ball dome. It's a bit deceptively-simple, because the agent only has to call the procedure with the right values. That's an intended flywheel: agents and users can save procedures to a community library, and future agents can discover and invoke the pre-existing procedures, instead of re-deriving them. They can copy and edit the procedure as needed, so it basically becomes a RAG library for carpentry procedures written in the project's DSL.
[1] A sawhorse, re-implemented by Opus 4.8 from plans published by Home Depot. Now saved as a procedure so you can drop a sawhorse anywhere you need one.
[2] A small shed with stick frame walls and a lean-to roof.
[3] The Picture frame prompt mentioned in my first comment.
Quick note: I recently broke the smart de-duplication for step-by-step playback, so it will show every single cut, one by one. Hit "stop" and then click the final frame in the timeline to see just the finished state.
[0] https://sawdust.diy/share/b0de719c-0e9f-4f4f-9282-085c521163... (bucky ball dome)
[1] https://sawdust.diy/share/45557307-0f78-4b5b-bf0e-eae77a9853... (sawhorse)
[2] https://sawdust.diy/share/cbb591fc-5511-40d9-8ccd-0d9f8c10be... (shed)
[3] https://sawdust.diy/share/ba109216-8849-49e3-ae9c-5a15982d24... (polygonal frames)
Really nice application. It works well on mobile - except the instruction text for each step is covering most of the render viewport
The interesting design question here is whether unifying GUI and CLI operation in a single agent loop actually improves performance or just makes the benchmark story cleaner.
Are they really not doing huggingface releases anymore? I remember not being able to find one on their latest HN front-page release either :(
Yeah it's time for a nice 8-14b style model again. I'd love to have one with the latest improvements especially tool calling improvements.
They release non-plus models to HF.
They also do not release Max models too.
Unfortunately, no pricing or technical information has been released yet.
Just so good seeing so many great models showing up. Especially today as Copilot goes pay-per-use.