24 comments

  • bahmboo 5 hours ago

    I see a lot of snark in the comments. Simon is a researcher and I really like seeing his experiments! Sounds like the goal here was to delegate a discrete task to an LLM and have it solve the problem much like one would task a junior dev to do the same.

    And like a junior dev it ran into some problems and needed some nudges. Also like a junior dev it consumed energy resources while doing it.

    In the end I like that the chunk size of work that we can delegate to LLMs is getting larger.

    • Upvoter33 5 hours ago

      No offense, but I hate all the comparisons to a "junior dev" that I see out there. This process is just like any dev! I mean, who wouldn't have to tinker around a bit to get some piece of software to work? Is there a human out there who would just magically type all the right things - no errors - first try?

      • solumos 5 hours ago

        > And like a junior dev it ran into some problems and needed some nudges.

        There are people who don't get blocked waiting for external input in order to get tasks like this done, which I think is the intended comparison. There's a level of intuition that junior devs and LLMs don't have that senior devs do.

        • the-grump 3 hours ago

          To offer a counterpoint, I had much better intuition as a junior than I do now, and it was also better than the seniors on my team.

          Sometimes looking at the same type of code and the same infra day in and day out makes you rusty. In my olden days, I did something different every week, and I had more free time to experiment.

          • fastball 2 hours ago

            So you are a worse dev now than you were before? Have you asked for a pay cut from your employer?

            • arthurcolle an hour ago

              pay increase - with better tools, I'd imagine

      • conradev an hour ago

        Codex is actually pretty good at getting things working and unblocking itself.

        It’s just that when I review the code, I would do things differently because the agent doesn’t have experience with our codebase. Although it is getting better at in-context learning from the existing code, it is still seeing all of it for the “first time”.

        It’s not a junior dev, it’s just a dev perpetually in their first week at a new job. A pretty skilled one, at that!

        and a lot of things translate. How well do you onboard new engineers? Well written code is easier to read and modify, tests helps maintain correctness while showing examples, etc.

      • bahmboo 4 hours ago

        Point taken and I should have known better. I fully agree with you. I suppose I should say inexperienced dev or something more accurate. Having worked with many inexperienced devs there was quite a spread in capabilities. Using terms that are dismissive to individuals is not helpful.

  • qingcharles 4 hours ago

    I did the opposite yesterday. I used GPT5 to brute force dotnet into Claude Code for Web, which finally involved it writing an entire HTTP proxy in Python to download nuget packages.

  • BoredPositron 8 hours ago

    Compute well spent... finding out to download a version and hardware appropriate wheel.

    • prodigycorp 43 minutes ago

      Don't ask how many human compute hours are spent figuring this out.

    • Zopieux 6 hours ago

      Gotta keep the hype up!

  • cat_plus_plus 5 hours ago

    No idea why Nvidia has such crusty torch prebuilds on their own hardware. Just finished installing unsloth on a Thor box for some finnetuning, it's a lengthy build marathon, thankfully aided by Grok giving commands/environment variables for the most part (one finishing touch is to install latest CUDA from nvidia website and then replace compiler executables in triton package with newer ones from CUDA).

    • htrp 5 hours ago

      serious q, why grok vs another frontier model?

      • cat_plus_plus an hour ago

        Grok browses a large number of websites for queries that need recent information, which is super handy for new hardware like Thor.

  • varispeed 5 hours ago

    I am the only one seeing this Nvidia Spark as meh?

    I had it in my cart, but then watched few videos from influencers and it looks like power of this thing doesn't match the hype.

    • dumbmrblah 5 hours ago

      For inference might as well get a strix halo for half the price.

    • throwaway48476 4 hours ago

      Its also going to be unsupported after a few years.

  • syntaxing a day ago

    Ehh, is it cool and time savings that it figured it out? Yes. But the solution was to get a “better” version prebuilt wheel package of PyTorch. This is a relatively “easy” problem to solve (figuring out this was the problem does take time). But it’s (probably, I can’t afford one) going to be painful when you want to upgrade the cuda version or specify a specific version. Unlike a typical PC, you’re going to need to build a new image and flash it. I would be more impressed when a LLM can do this end to end for you.

    • sh3rl0ck a day ago

      Pytorch + CUDA is a headache I've seen a lot of people have at my uni, and one I've never had to deal with thanks to uv. Good tooling really does go a long way in these things.

      Although, I must say that for certain docker pass through cases, the debugging logs just aren't as detailed

      • ComputerGuru 7 hours ago

        uv doesn’t fundamentally solve the issues. It didn’t invent venv or pip.

        What fundamentally solves the issue is to use an onnx version of the model.

        • simonw 7 hours ago

          Do you know if it's possible to run ONNX versions of models on a Mac?

          I should try those on the NVIDIA Spark, be interesting to see if they are easy to work with on ARM64.

          • ComputerGuru 2 hours ago

            Yup. The beauty of it is that the underlying ai accelerator/hardware is completely abstracted away. There’s a CoreML ONNX execution provider, though I haven’t used it.

            No more fighting with hardcoded cuda:0 everywhere.

            The only pain point is that you’ll often have to manually convert a PyTorch model from huggingface to onnx unless it’s very popular.

    • cat_plus_plus 5 hours ago

      You can still upgrade CUDA within forward compatibility range and install new packages without reflashing.