Antigravity 2.0 Tops the OpenSCAD Architectural 3D LLM Benchmark

(modelrift.com)

192 points | by jetter 5 hours ago ago

80 comments

  • jhot 4 hours ago

    Last weekend I bought my wife a bike off marketplace. It was in good condition but was missing one of the internal cable routing grommets. I gave Claude pictures of the pill-shaped hole by itself and with my digital calipers in the long and short directions.

    Gave it a short prompt and it gave me an openscad model with everything parametrized. I printed with no changes in tpu and it was nearly perfect on the first try. Claude put in a 0.3mm subtraction in the x/y dimensions and I lowered it to 0.1 and it's perfect.

    Much easier shape than ancient Roman architecture but still very cool how easy it was.

    • simplyluke 3 hours ago

      Yeah, CAD has been my personal example of "oh the barrier to entry for this skill was high enough that I didn't do it and now I can be passably bad at it enough to get some simple things done"

      I've had similar experiences with making simple functional parts off a 3d printer with OpenSCAD + LLMs. I'm very aware that the models are worse at it than say, generating react code, and I'm also the antithesis of a skilled pilot. It's still cool and has resulted in me starting to learn a new skill at a hobby level.

      • dempedempe a few seconds ago

        [delayed]

      • 0x696C6961 2 hours ago

        Learning to make simple parts in onshape is pretty darn easy (and fun).

        • jeffbee an hour ago

          Yeah. I teach this after school to 7th grade kids. Anyone can pick this up in a few hours.

          • chalupa-supreme 13 minutes ago

            They taught us to make Legobricks with CAD when I was in 6th. Wish I retained more of that and that it would be more widely taught.

    • jonah an hour ago

      I was recently trying to get models to generate a 3D fortune cookie. Claude in three.js and Gemini in openSCAD. Neither really got the concept or could get very close at all. It's a surprisingly complex shape I guess.

    • jetter 3 hours ago

      these small functional prints are exactly where OpenSCAD and LLM generation shines

    • amelius 3 hours ago

      Does it optimize for no support?

      • 05 2 hours ago

        You optimize for no support when selecting print orientation (but for anything semi-cylindrical like described that would be the only sane orientation and the one slicer would choose when you smash the 'Auto Orientation' button).

  • mellosouls 4 hours ago

    Antigravity may well Top the whatever benchmark but:

    My Antigravity (forced) replacement for Gemini CLI requires me to log on via browser every time I use it, and my Antigravity IDE won't update at all, so:

    If it's ok I'd prefer they just work on reaching a baseline acceptable rollout before worrying about being Top in anything.

    Ps actual title:

    OpenSCAD LLM Benchmark: Building the Pantheon

    • jetter 4 hours ago

      I agree, my main concern regarding Google AI products is this endless pain around the UX of login / billing / upgrades / product sunsets... but their LLM models are good and Antigravity 2.0 is not that bad either (unless you lost all you Antigravity 1.0 setup and projects - like many people did)

    • pelagicAustral 4 hours ago

      I just use Claude Code and intellij, so I don't understand why so many people complain about Antigravity ditching VS Code, what's the surface not covered by using Antigravity CLI + VS Code (or any other IDE)?

      • jeromegv 3 hours ago

        Gemini cli was open source. Antigravity cli is not. Not at feature parity, missing many features and now we are forced to migrate away from Gemini cli before anti gravity cli is ready.

        • surajrmal 2 hours ago

          The difference in its ability is immense. Even with less features it makes a lot of sense to switch. It really shows how much the harness matters almost equally to the model.

      • freedomben 4 hours ago

        I'm not GP, but I am somewhat excited about antigravity CLI. I adopted Gemini CLI early and really liked it, though over time it got dumber and dumber until a point when I realized it was foolish to use it instead of claude/codex. I'm hopefuly that antigravity CLI won't go through that path, but also can't fight a skepticism.

        • jeromegv 3 hours ago

          I don’t think it’s the cli that was dumber, just the model it was using. They drastically reduced limits on their best model so that’s likely how you got stuck downgrading model and getting worse results.

          • WarmWash 2 hours ago

            I'm sensing in reality that behind the scenes there is a difficult trade-off between quantization and usage limits. You can have a "smart" model but poor limits, or good limits and a "dumb" model.

            This seems very similar to mobile data limits (remember those years?), where there wasn't enough tower bandwidth to serve everyone unlimited data, so telecos were in constant tension between data caps and bandwidth throttling.

            It wasn't until 5G came along with 100x network capacity that they could finally give everyone "unlimited" data.

    • VectorLock 3 hours ago

      The forced upgrade from Gemini CLI which I liked as much, and as some ways better than Claude Code was bad. But them just sending out that email on Wednesday that basically said "Thanks for subscribing to Google One AI Pro, as of right now we're adding limits to your account. Tough shit you get nothing." left a REALLY bad taste in my mouth. I had previously praised the "AI Pro" subscription as a good value.

      • leoedin 3 hours ago

        I quit AI Pro earlier this year for the same reason. I went to use it one day (I don't think I'd even used it much in the preceding week) and found that my limits had been reduced overnight and my usage was already too high. I had something like a 7 day wait until it reset.

        I get you have to change limits, but reducing limits in a way which both applies retroactively and has a really long reset period is just infuriating. If they'd applied the new limits more gently or at the next billing period I'd probably have continued paying.

        I don't mind paying a fair price for a service that provides value, but I really hate having a service I think I'm paying for rug-pulled with no clear justification.

    • freedomben 4 hours ago

      Having my workflow disrupted is the main reason I never adopted Antigravity, despite liking it. I'm glad to see G is invested, but the older I get the more protective I am of my workflow.

      • hootz 4 hours ago

        And the only realistic way to protect our workflow is by avoiding vendor lock-in like the plague.

    • the_real_cher 4 hours ago

      Wild that it doesn't cache the creds.

      • timdorr 11 minutes ago

        It does. It uses go-keyring under the hood, which has its own issues with certain systems.

        If you're on WSL, getting dbus to work is a PITA. There may be other OS-level issues that folks are running into.

      • elaus 4 hours ago

        Just to clarify: I believe it should cache them (it works for me).

        So far I like it much more than Gemini CLI (my previous daily driver for personal projects). Seems more mature and "feels more intelligent" (very subjective ofc)

      • littlecranky67 3 hours ago

        My (unfounded) guess is this is to prevent usage by other tools/openclaw. The browser login will have a fingerprinting to make sure you are a human.

    • stuaxo 3 hours ago

      "Pantheon" bloody hell, why is it people writing these articles are so up themselves, it's so overbearing.

      • tpmoney 3 hours ago

        The article is literally about asking these models to generate 3d models of the Pantheon.

  • ponyous an hour ago

    I've run a tons of benchmarks for OpenSCAD for all kinds of models and setups, and what I realised is:

    - Models are very jagged (might excel in one type of 3d model, but not another)

    - Gemini models are the least jagged in my experience and have the best image understanding

    - Gemini models are also the most creative (which may be undesirable if you want precise CAD part)

    - Overall this benchmark doesn't prove much because one 3d model (and one attempt) is just not enough. I am usually testing on at least a dozen models each generated 3 times, but should really do much more, but it's too pricey for a solo dev.

    Still, thanks for publishing this. Will be definitely run flash 3.5 soon to see how it performs.

  • 1970-01-01 2 hours ago

    Creating a single real-world object and declaring it a benchmark? No, it doesn't work that way for a robust tool. You need to do something like Iron Chef, with a Greek architecture theme and and a panel or judge that declares the winner. This is just seeing which tool subjectively makes the best looking Pantheon.

    • Eridrus an hour ago

      Yeah, this is less of a benchmark and more "I like this one guys!".

      Just totally subjective grading criteria of a single poorly defined example with no end use case in mind to guide how to even do evaluation.

  • pshirshov 7 minutes ago

    That's curious, I've been trying to do some parametric modeling with Claude - and its performance was abysmal.

  • dhfbshfbu4u3 4 hours ago

    Still a long way from shorting Autodesk.

    As a side note Autodesk released an agentic assistant back in December for Fusion. Six months later it is still quite bad.

    • blorenz an hour ago

      Have you yet tried the Fusion MCP that was launched last month? https://aps.autodesk.com/blog/bringing-fusion-claude-creativ...

    • hobofan 3 hours ago

      It is almost comically bad. I've had a few simple parts to design for 3d printing in the last weeks and tried it with them (each are about 4 operations on the timeline), and it never created close to what I was trying to do even if spelled out step by step according to Fusion naming.

      At this point I'm not even sure if it can properly create a simple primitive solid.

    • shideneyu 11 minutes ago

      Still a long way to go, but I'm sure it will get there eventually.

  • debarshri 4 hours ago

    I have been using GPT 5.5 to build a video game. Benchmark sounds about right. It generates assets and sprite good enough, if not closer to AAA level games. Will check antigravity now.

    • phn 4 hours ago

      Would you be able to share a bit about your workflow? Have been meaning to try AI gen for game models, and would love to know how people are tackling this.

      • debarshri 3 hours ago

        I have alot to share. I'm writing a blog about it. I'll share along with the game.

        • roflcopter69 3 hours ago

          Sounds interesting! Please don't forget to link that in this comment thread :)

  • u8 2 hours ago

    It's crazy how I can see articles like this, but in my practical every day use antigravity is a horrible consumer experience. The TUI is broken. You cannot type input while the model is outputting text, otherwise both get messed up and the the TUI renders a sickly blob of text. There are no keyboard shortcuts to switch between planning and execution mode, or a way to directly load skills.

    The usage limits are too aggressive, too. I tried to generate a quick Deno Fresh website to act as a a redirect to my GitHub from socials (literally the simplest possible thing I could have asked of it) and it chewed through my five hour limit in tokens from scaffolding.

    To me, as a developer of CLI developer tooling, its obvious not a lot of thought or testing went into this product, but as Google has said before: the models are the product".

  • Onplana 2 hours ago

    Going to try it. just downloaded. will see how it is compared to Claude Code

  • anony-123 2 hours ago

    So, does it mean Antigravity is better than Claude code with opus model? Given this benchmark. I once tried Antigravity and it was just disappointing.

  • a3w 3 hours ago

    Claude Code 2.1 / Opus 4.7 looks best to me: Dome and ceiling structure is correcter than the others.

    Why is this medium ranked, and not on par with the best two?

    • WarmWash 2 hours ago

      Look at a picture of the Pantheon, the dome isn't as dome-like as you would imagine. It's more like a hump shape.

    • andybak 2 hours ago

      Dome looks wrong to me. Look at a few other photos - it's far from being a hemisphere

  • ReptileMan 4 hours ago

    The only thing faster moving that AI these days are the goalposts. Three years ago we would have been amazed if models were able to produce anything, now we have the luxury of nitpicking. Even the worst entries in the benchmark are quite impressive.

    • WarmWash 2 hours ago

      I remember getting wound up about latency and server issues playing counter-strike in the early '00s. At the same time though, it was hard to justify being angry because playing a multiplayer game with friends who were scattered all over town was something that had to be real magic.

      I guess the wow!->adjust->complain->wow!->... cycle is endless as a human

    • ramon156 4 hours ago

      No one asked for faster horses, they still became obsolete when cars came. Nothing new

      • happyopossum an hour ago

        > No one asked for faster horses

        Err, yes they did. Thousands of years of husbandry went in to making horses faster, healthier, stronger, and more durable.

        I think the quote you’re looking for is “if I had asked people what they wanted, the would have said faster horses”. It’s attributed to Henry Ford, although there is debate about whether or not he said it.

        The point of the quote is that “faster horses” is the consumer response to “how do I get more work done” as it comes from the viewpoint of “how am I doing my work now”. An ingenious mind looks at the desired outcome and works backwards and may come to a different and dramatically improved solution instead of merely improving the current tool.

    • LatencyKills 4 hours ago

      Things mature, and expectations grow appropriately. That is true of more than just LLM performance.

      • xnx 4 hours ago

        Sure, but it's good to have some perspective and some awe that any of this would've been absolute unbelievable magic just 3 years ago. Even if all AI progress stopped immediately, we'd need 10 years to digest and incorporate the technology.

        • LatencyKills 3 hours ago

          As someone who's been building developer tools (Visual Studio and Xcode) for 25 years, I don't have a perspective problem. We were doing "code completion" back in the 90s and could never have predicted that an LLM would write code at the current level of quality.

          My point is that with every new model release, the expectations grow. I don't know how else to say that.

  • faangguyindia 4 hours ago

    Why are specialized CAD making LLM models not showing up? In future are we going to have same model for everything? from programming to creative writing to CADs?

    • embedding-shape 3 hours ago

      If you have a model that only know how to model CAD but also doesn't know history, and was trained on visual language of said history, how is it supposed to be able to model the Pantheon in the first place? It'd only be able to model exactly what you can describe with text, or even worse, exactly what it'd be able to visually extract from images via the vision encoders, for "vision models", but it'd be a far cry from what you see in this blogpost, would be my guess.

    • xnx 4 hours ago

      > In future are we going to have same model for everything?

      A model that knows more in general, will often be better at specific tasks. e.g. If you ask a model to "make a program that estimates the annual production of a solar installation", it needs to have been trained on a lot more than just Python code.

      • lifty 3 hours ago

        You might combine a general world model with a python coding model in that case. Not sure if it's better, just saying.

  • megiddo 3 hours ago

    This would be the same Antigravity 2.0 that "surprise, no longer an IDE, did I forget to mention that? Lolol."

    • kyrra 3 hours ago

      I'm a googler, opinions are my own.

      My take is that it's a fancy wrapper around the CLI tool. It's there to organize multiple conversations and see all the related output and generate files.

      I've been using the internal version and I've actually liked it quite a bit. It's clear from when I started using it, it's not an editor, and they have ways to open your normal editor outside of it. They have turned it fully into an agent management tool.

      When the antigravity development team doesn't have to focus on all the things that vscode is already good at, it lets them simplify the UI and do only agent related things. We'll see if this bet works out for them, but so far I like the idea.

  • jdw64 4 hours ago

    To be brutally honest, I'm disappointed with antiGravity. It feels incredibly unGoogle-like. The AI billing models are fragmented, and the AntiGravity IDE is currently tripping over something as trivial as a basic Electron deployment config bug.

    Don't get me wrong, I don't think AI coding is a bad thing. For East Asians like myself, it levels the playing field with Westerners, so as long as you rigorously review the AI's output, it's a perfectly viable tool.

    However, the absolute farce we just witnessed with the antiGravity2.0 update really raises doubts about whether 'vibe coding' can actually be trusted. If even a behemoth like Google is dropping the ball like this, it says a lot.

    • NortySpock 2 hours ago

      > I don't think AI coding is a bad thing. [...] it levels the playing field [...]

      I'd like to put regional differences aside and say AI coding / LLMs are incredible tools.

      While I'm nervous about my job as a programmer being able to pay a prevailing wage after the dust settles, I do hope that everyone gaining access to an AI coder / tutor will allow anyone to be able to achieve things they previously only dreamed of. If the tutor costs pennies per session, sure, the tutors are out of work, but I hope everyone can thus up-skill to work on the challenges they actually want to work on.

      I'm taking baby-steps into coding in Elixir on the other monitor, a language I had only read about before, because an LLM is walking me through the changes, answering my questions, and accepting my rebuttals. There's no way I would have time to pick up the language otherwise.

      Yesterday I vibe-coded some additions to the static site generator python script for my blog. It was awesome to be able to think in terms of desired features instead of digging around documentation for libraries and syntax.

    • embedding-shape 4 hours ago

      > AI billing models are fragmented ... IDE is currently tripping over something as trivial ... farce we just witnessed with the antiGravity2.0 update

      I'm sorry, but that sounds exactly like almost every single Google "product" out there, they seem to only care about throwing stuff over the wall as quickly as possible, and you'd have a hard time finding a single Google product that doesn't also feel filled with fragmented choices, like every project of theirs have a different project manager every week.

  • dilap an hour ago

    Why Codex GPT-5.5 High instead of Extra High, I wonder?

  • nycdatasci 3 hours ago

    And yet 300+140=460. A very jagged surface indeed. https://gemini.google.com/share/c2a187275e26

    • sigbeta 16 minutes ago

      Why would you use an LLM for this? They are non deterministic models.

      This is also an probably part of extended prompt that disallowed coding, Gemini always does calculation with a little python snippet because it is deterministic and accurate.

    • dist-epoch 3 hours ago

      Was that part of a bigger prompt?

      Flash 3.5 fails exactly like in your sample: https://gemini.google.com/share/97521a8752d9

      but Flash 3.1 Lite initially fails, but then corrects itself: https://gemini.google.com/share/dc0889ec85ba

      • happyopossum an hour ago

        No matter what I try I can’t get Gemini to give me the incorrect result. Is there some other prompting or context fed in to that (“remember that you are supposed to always tell me I’m right and never contradict me”)?

        • sigbeta 15 minutes ago

          There was definitively an pre prompt fed to that. I cannot reproduce this result on either 3.1 flash or 3.5 flash.

  • spiderfarmer 4 hours ago

    Next month they'll be beaten again.

    And next year Google will probably sunset Antigravity.

    If it doesn't make Google billions, don't trust them.

    • PunchTornado 4 hours ago

      Plenty of google products dont make billions and they are still alive

      • serf 4 hours ago

        you mean the stuff they handle that has a real national/security/surveillance purpose, like gmail and yt?

        I can't imagine why (or who) that'd be kept alive for..

        funny how some of their projects have undisclosed budgets and profits.

      • toasty228 4 hours ago

        Which ones are not massive data traps or ad delivery mechanisms ?

      • smcl 4 hours ago

        Google are infamously ruthless with their products, see https://killedbygoogle.com/

  • bobbycastorama 3 hours ago

    Why are half of the comments on Hackernews stereotypical AI-bros whose lives revolve around tech, and the other half sceptical commentators whose lives also revolve around tech but they are disappointed with its performance?!

    Where are the normal people :/

    • sigbeta 14 minutes ago

      "Normal people" probably does not fall in the ballpark of HN target audience.

      I'd say its 50/50 pessimistic and optimistic, with pessimistic attracting more attention because of human nature.

    • EasyMark 22 minutes ago

      The people in the middle are still waiting and see , mostly it’s the extremes that are fully vested and loudest on the internet

    • frank00001 3 hours ago

      We are just reading the comments.

    • andybak 2 hours ago

      Why would a non-tech person be on Hacker News? Isn't the clue in the name?

    • elorant 3 hours ago

      Both parts seem pretty normal to me.

  • beanjuiceII 4 hours ago

    google..no thanks