Everything I built with Claude Artifacts this week

(simonwillison.net)

544 points | by recvonline 12 hours ago ago

380 comments

  • wraptile 26 minutes ago

    Most of these are really just 10 lines of python code. The value of generating an entire HTML with GUI is great but then the overhead comes in when you need to modify it or fix something or god forbid add a dependency library and you end up spending more time than actually building the tool from scratch. It's getting close though.

    • vasco 2 minutes ago

      How many google results amount to this level of complexity though? How many phone apps?

      Anyone taking bets on how many years till the operating system doesn't install any software anymore, and just dynamically generates whatever software you need on the fly? "Give me a calculator app" is doable today "give me an internet browser" isn't but it should be a matter of time.

  • rty32 10 hours ago

    I'm sure there are plenty of examples like this, but one thing that I find really hard to deal with is to integrate such tools into existing codebase -- you can make all these things as standalone pages, but for a professional developer, you have certain standards and conventions, and often it takes a lot of work to review/revise the code to make it work with existing codebase, so much that you end up using inline completion just to help with obvious stuff or boilerplate. I woule rather spend 20% extra amount of time to write the code myself yet have confidence, than spend time tweaking the prompt or giving follow up instructions.

    • Buttons840 9 hours ago

      With a sufficiently advanced type system, you could lay out high level building blocks, such as function definitions, and let the LLM go wild, and then if things compile there's a high chance it's correct. Or maybe even a formal proof things are correct.

      I was blown away when I realized some Haskell functions have only one possible definition, for example. I think most people haven't worked with type systems like this, and there are type systems far more powerful than Haskell's, such as dependant types.

      There's not much reason to worry about low level quality standards so long as you know it's correct from a high level. I don't think we've seen what a deep integration between a LLM and a programming language can do, where the type system helps validate the LLM output, and the LLM has a type checker integrated into its training process.

      • oblio 8 hours ago

        > With a sufficiently advanced type system

        Is this a brother or a cousin of the "sufficiently advanced compiler"? :-)

        • zahlman 6 hours ago

          A component, if I correctly understand the proponents of such type systems.

        • inopinatus 7 hours ago

          if there is only one valid translation of the type constraints into executable code then what you have is a slow, expensive, and occasionally argumentative compiler

          it merely remains to build a debugger for your Turing-complete type system, and the toolchain will be ready for production

        • MichaelBurge 8 hours ago

          Anything on top of the Calculus of Constructions is usually enough. So it's not a moving target, and there are multiple implementations.

      • rq1 an hour ago

        I did just that actually to:

        * build a codegen for Idris2 and a rust RT (a parallel stack "typed" VM)

        * a full application in Elm, while asking it to borrow from DT to have it "correct-by-construction", use zippers for some data structures… etc. And it worked!

        * Whilst at it, I built Elm but in Idris2, while improving on the rendering part (this is WIP)

        * data collators and iterators to handle some ML trainings with pausing features so that I can just Ctrl-C and continue if needed/possible/makes sense.

        * etc.

        At the end I had to rewrite completely some parts, but I would say 90% of the boring work was correctly done and I only had to focus on the interesting bits.

        However it didn’t deliver the kind of thorough prep work a painter would do before painting a house when asked for. It simply did exactly what I asked, meaning, it did the paint and no more.

        (Using 4o and o1-preview)

      • skybrian 7 hours ago

        The functions for which there's only one implementation are trivial examples. It's not going to work for anything even slightly more complicated, like a function that returns a float.

        Even if you could, you probably wouldn't want to make any change a breaking change by exposing implementation details.

      • fhdsgbbcaA 8 hours ago

        Things I never want to hear about flight control systems before I board a plane: “if things compile there's a high chance it's correct”

        • literalAardvark 7 hours ago

          Very odd comment, since that's exactly what you do want to hear

          • gloflo 2 hours ago

            I'd rather hear: "The compiled code has gone through all tests in the comprehensive, human-expert-written, standardized test suite correctly"

            Compiling does not differentiate between True and False, so no safety for that escape pod door.

            • literalAardvark an hour ago

              I took that as part of the build process.

              But I definitely want as much as possible to be automated and formally correct, which is why I wrote what I wrote.

          • xmprt 3 hours ago

            There's a lot of code that compiles but isn't correct.

            • literalAardvark 37 minutes ago

              Because we're using languages with flexibility but no correctness, but the vast performance advantage AI programming has over us could be used to manage the formal proofs for a verified toolchain.

              We're not quite there yet, but while regular programming is quite tough for AI due to how fuzzy it is, formal proofs are something AI is already very good at.

      • halfmatthalfcat 8 hours ago

        I'm not sure how much things have changed but I tried to use GPT-4 when it first came out to build a library on top of Scala + Shapeless and it utterly failed. Unless you can somehow wire in an LSP per language as an agent and have it work through the type errors as it tries to create code, I can't see where we'll ever get to a place where LLMs can work with strongly typed languages and produce compliant code.

        Even with, the aforementioned "have an LSP agent work through type errors", it may be faster to just do it yourself than wait for an LLM to spit out what may be correct.

        • nycdatasci 8 hours ago

          Definitely try Claude 3.5 Sonnet and o1-preview. They have succeeded for me where other models have failed. Also, use Cursor IDE.

          • csomar 2 hours ago

            Claude 3.5 is good, however, it's "Type comprehension" is really basic. That was my experience with it when using Rust. It still can't create an internal mental model of the types and how to link them together. It'll also, at some points, start to heavily hallucinate functions and stuff.

          • mlhpdx 3 hours ago

            Oh, I’ve found Claude 3.5 to be better but still pointless. To be specific, it generates code that does what I ask (roughly) but…

            - Has obvious bugs, many at runtime - Has subtle bugs. - Is inefficient.

            All of which it will generally fix when asked. But how useful is it if I need to know all the problems with its code beforehand? Then it responds with the same-ish wrong answer the next time.

            Still a long way to go IMO.

          • cft 6 hours ago

            o1-mini is even higher in coding benchmarks than o1-preview. That has been my experience also.

      • szundi 3 hours ago

        You know people work with what they work with.

      • rapind 4 hours ago

        Oh this is an interesting take. Haskell, F#, possibly Elm, etc.

      • TechDebtDevin 3 hours ago

        I'm sort of playing around with something like this in go for fun.

    • fny 8 hours ago

      I find it helps if you treat the code generated as a third-party package with a well defined API. Then your role becomes gluing things together.

      It’s an approach similar to how I’ve dealt with junior devs in the past. You specify an interface for a class, provide examples as a spec, and you get what you want without colliding with the main project.

      For sanity’s sake, I keep these AI generated modules in single files just so it’s an easy copy and paste into ChatGPT.

    • cube2222 9 hours ago

      With something like the AI assistant in Zed you’d generally provide a few files the assistant can use as a reference. I’ve had good luck in having it follow the codebase’s style and standards this way.

      • benwilber0 9 hours ago

        +1 to the completions/inferences in Zed. It's the first editor that I feel (mostly) confident about just tabbing-through the AI completions with minimal prompting/re-editing.

    • salviati 10 hours ago

      Have you tried https://aider.chat ?

      • vbezhenar 9 hours ago

        I tried it yesterday and wasn't successful. I spent like 30 minutes trying to explain to it to make a simple change. Every time it made this change and several others as well which I didn't ask for. I asked to undo those several other changes and it undoes everything or does other unrelated things.

        It works good until it doesn't.

        It's definitely a useful tool and I'll continue to learn to use it. However it is absolutely stupid at times. I feel there's very high bar to use it, much higher than traditional IDEs.

        • ripley12 6 hours ago

          Which LLM were you using? I’ve had a great experience with Aider and Claude Sonnet 3.5 (which is not coincidentally at the top of the Aider leaderboard).

          • esperent 6 hours ago

            I've been using Claude dev VSCode extension (which just got renamed but I forget the new name), I think it's similar to Aider except that it works via a gui.

            I do find it very useful, but I agree that one of the main issues is preventing it from making unnecessary changes. For example, this morning I asked it to help me fix a single specific type error, and it did so (on the third attempt, but to be fair it was a tricky error). However, it persistently deleted all of the comments, including the standard licensing info and explanation at the top of the file, even when I end my instructions with "DO NOT DELETE MY COMMENTS!!".

            • kbaker 4 hours ago

              You may want to peek at the system prompts Aider uses. I think this is part of the secret sauce that makes it so good.

              https://github.com/Aider-AI/aider/blob/main/aider/coders/edi...

              excerpt: """ Act as an expert software developer. Always use best practices when coding. Respect and use existing conventions, libraries, etc that are already present in the code base. {lazy_prompt} Take requests for changes to the supplied code. If the request is ambiguous, ask questions.

              Always reply to the user in the same language they are using.

              Once you understand the request you MUST: """ ... etc...

      • freediver 9 hours ago

        I am a big fan!

      • kridsdale3 10 hours ago

        Can those kinds of things work in monorepos with 50 million files?

        • syntaxing 9 hours ago

          They use this thing called repo map[1]. I only used it for personal projects and it’s been great. You need to add the files you care about yourself, it’ll do its best and add additional files from the repo map if needed.

          Since it’s git based, it makes it very easy to keep track of the LLMs output. The agents is really well done too. I like to skip auto commit so I can “git reset —hard HEAD^1” if needed but aider has built in “undo” command too.

          [1] https://aider.chat/docs/repomap.html

          • cdchn 8 hours ago

            Thats a cool idea, kind of reminds me of ctags.

            • imjonse 3 hours ago

              Aider had actually used ctags to implement that feature before they switched to tree-sitter.

        • adamtaylor_13 9 hours ago

          No and neither can you. Like you, it works best with small, focused context.

          These tools aren’t magic. But they do certain tasks remarkably well.

          • dartos 7 hours ago

            > No and neither can you.

            People do work on monorepos with 50 million+ files, though…

        • ctoth 9 hours ago

          Can you work in a repo with fifty million files? Can Git? I just checked on my Windows machine using Everything and there are 15,960,619 files total including every executable, image, datafile, &c.

          Out of curiosity what does your IDE do when you do a global symbol rename in a repository with fifty million files?

          I'm absolutely a real human, and I think this just might be too much context for me! Perhaps I am not general enough.

          • dexwiz 7 hours ago

            Having worked on a codebase like that, you need to use some extra plugins to get git to work. And even then, it’s very slow. Like 15-30 seconds for a git status to run even with caching. Global renames with an IDE are impossible but tools like sed and grep still work well. Usually there is a module system that maps to the org structure and you don’t venture outside of your modules or dependency modules very often.

          • achierius 7 hours ago

            I thought this was common knowledge but I guess not: Google's monopoly famously has over a billion files. No, Git cannot handle it. Their whole software stack is developed around this from the ground up. But they are one of the largest software employers in the world, so quite a few engineers evidently do make do with 200x more than 50 million files.

            • nasmorn 3 hours ago

              Monopoly <> Monorepo This is the funniest typo possible in the context of google

        • paradite 6 hours ago

          I made a tool that allows you to use LLMs on large codebases. You can select which files are relevant and embed them into the prompt: https://prompt.16x.engineer/

          Based on my personal experience it works well as long as each file is not too long.

        • salviati 10 hours ago

          I believe they can as long as you're able to identify a contained task that touches no more than a handful of files. Still very useful to automate some tedious work or refactoring if you ask me.

          • jprete 9 hours ago

            That's effectively an answer of "no".

            • sthatipamala 9 hours ago

              I used to work in a monorepo of that size.

              All of the PRs I ever submitted touched a handful of files in my project’s subdirectory.

            • raincole 9 hours ago

              That's effectively an answer of "yes".

              Or what "yes" looks like to you? It can do all the work itself, for a 50m-file monorepo, without a human guiding it which files to look at?

              If it were true then human programmers would have been considered obsoleted today. There would be exactly zero human programmers who make any money in 2025.

        • rorytbyrne 9 hours ago

          It doesn't take the whole repo as context, it tries to guess which files to look at. So if you prompt with that in mind, it works well. Haven't tried it on a very large codebase though. You can also explicitly add files, if you know where work should be done.

    • Volrath89 4 hours ago

      You are right, except on the part about tweaking the prompt to get your desired code styling.

      The easier way to integrate into an existing code base is just to refactor the code yourself. AI gives a working version, you refactor and move on. For me this has been a huge productivity boost from writing everything from scratch

    • inciampati 2 hours ago

      Exactly. Use aider to do this.

      Tell claude or your favorite LLM to write a full plan to implement what you need in such a way that your coworker can implement it.

      Copy the result into aider, and check the results!

    • hackernewds 3 hours ago

      You spend 40-100 hours/wk at work. Ok to spend 2 hours to give Claude the info around those conventions. It should then save you 20-40 hours/wk

    • beefnugs 5 hours ago

      This isn't for professional software developers. This is for managers, so they can shit out a one week test and then feel superior enough to pay software engineers less for the "easy" job they do.

    • richardw 9 hours ago

      Drag in a couple files and say “make a class that does X but use this format”. I absolutely don’t rely on it for lots of things but it’s absolutely capable of working with existing code. Claude is far better than OpenAI when dealing with sets of existing files. I also like that it’s outside of my IDE so I make the final changes. LLM’s love to just write tokens so I keep a fairly short leash.

    • bboygravity 3 hours ago

      You can feed it a bunch of your code style as part of a project and it will just adhere to that.

    • Atotalnoob 7 hours ago

      Just tell it those conventions or standards.

      Using GitHub copilot, I just tell it to style its code like an example and it gets pretty close.

    • codingwagie 10 hours ago

      cursor.sh, add context to the prompt

      • rty32 5 hours ago

        I haven't spent enough time with cursor specifically, but with other similar coding assistants, adding context can take some time. And often, even if it does save time, the action of "adding context" itself gets tiring and tedious so that you don't want to bother but instead just write the code yourself. It's about mental affordability.

    • paradite 6 hours ago

      I built a tool specifically for integrating LLMs like Claude into existing codebase and daily coding workflow: https://prompt.16x.engineer/

  • galaxyLogic 2 hours ago

    I think there is a commonly accepted rule of thumb that it is easier to write new code than modify existing code. Right?

    Why that is there may be several reasons one being that when you try to improve existing code you would need to know all the untold dependencies in it to not break anything. Whereas when you write new code, there are no dependencies you don't know about.

    But so, if you take an AI provided implementation, and try to fix it, you are basically doing just that, trying fix old code you don't know much about. You are working on a (AI-provided) legacy code-base in essence.

    • berkes 2 hours ago

      The hard part about writing code, isn't writing the code.

      It's writing code that can be read, changed, understood. Today, next month, after ten years of random freelance engineers abusing it. By caffeinated me, tired me, bored me, that way too smart junior and that boneheaded senior.

    • fulafel 2 hours ago

      On the other hand, you can cheaply have a lot of rewrites iterating with updated requirments now.

    • te_chris 2 hours ago

      This is an underrated point. And it’s not just an old code base. I find AI generated changes can lead to messy debugging when they don’t work as I didn’t write it first off.

    • lynx23 an hour ago

      Thats a bit like calling the code your intern just handed in "legacy". Modifying AI code doesn't have all the same shortcomings as with a legacy codebase, because the AI code was likely not very good in the first place. It feels like doing code review for a newbie. They come up with nice ideas, but if you take a good look at it, you usually find logic errors.

    • danielovichdk 2 hours ago

      No. That's not how it works. Reading code towards refactoring it is harder than writing new code. Lots of fine literature has been written about this, it's all psychology and cognition.

      What you are trying to propose is that computer generated code should somehow make a programmer feel better about one self because they are given the opportunity to improve something that no one knows where is coming from and without a context.

      I think you are forgetting that computers can not generate context aware systems or programs. Look at the list by the author - useless in a different context than the one the author lives in. It does not improve anything, it simply adds more things someone else potentially has to worry about. Furthermore it adds to the same cognitive load than any other code - one needs to read it before one can change it, and changing it is really the first step to fully understanding it.

      You are not fixing anything old with AI generated code. Ask yourself also, where is the upstream you are trying "to fix".

  • M4v3R 11 hours ago

    It's funny how we went from "it's impossible for a computer to write meaningful code by itself" to "yawn, another one of these" in like 2 years.

    • HaZeust 11 hours ago

      I said this last year[1] and still FIRMLY believe it:

      "It's even crazier to me that we've just... Accepted it, and are in the process of taking it for granted. This type of technology was a moonshot 2 years ago, and many experts didn't expect it in the lifetimes of ANYONE here - and who knew the answer was increasing transformers and iterating attention?

      And golly, there are a LOT of nay-sayers of the industry. I've even heard some folks on podcasts and forums saying this will be as short-lived and as meaningless as NFTs. NFTs couldn't re-write my entire Python codebase into Go, NFTs weren't ever close to passing the bar or MCAT. This stuff is crazy!"

      1 - https://news.ycombinator.com/item?id=37879730

      • cloogshicer 10 hours ago

        > NFTs couldn't re-write my entire Python codebase into Go

        Neither can LLMs. They can produce output that looks like a plausible re-write of your codebase, but on closer inspection turns out to have many minor and major errors everywhere.

        The problem is that the closer inspection part is very often more work than writing the code by hand in the first place.

        There hasn't been enough evidence for me that this will be possible to fix.

        • devjab 2 hours ago

          I disagree with you on this. If you go through my history on LLM’s you’ll see that I didn’t consider them more than fancy auto-complete. I still think of it mainly as fancy auto-compete for a lot of things, but we’ve begun using Claude in our porting of our C to Rust. Claude does it really, really, well. You have to look it over, but it’s far more efficient than any one of us can without the assistance. I don’t have the exact numbers but we’re close to a 90% accuracy on what is accepted without corrections.

          We follow a YAGNI approach to our code architecture and abstractions, meaning it’s very straight forward with things happening where they are written and not in 9 million places like Clean Code lovers try to do. Our C services and Libraries are also fairly small and “one purpose”. I’m not sure you would be wrong on larger code bases, at least not right now.

          With what we see Claude do now though, I don’t think we’re far from a world where Software Developers are going to do significantly different work. I also think quite a lot of the stuff we do today will no longer exist.

        • HaZeust 10 hours ago

          I've used GPT-4 to do what I had said. I pasted the errors I was given, and did so for 2-3 more iterations, and it successfully ported critical in-house infrastructure from Python 3 to Go.

          • tharant 8 hours ago

            I feel like I’ve been gaslit by the entire GenAI industry that I’m just bad at prompt engineering. When I’m talking to an LLM about stuff unrelated to code-generation, I can get sane and reasonable responses—engaging and useful even. The same goes for image generation and even the bit of video generation I’ve tried. For me however, getting any of these models to produce reasonably sane code has proven elusive. Claude is a bit better than others IME but I can’t even get it to describe a usable project template and directory structure for anything other than very simple Scala, Java, or Python projects. The code I’m able to generate always needs dramatic and manual changes; even trying to get a model to refactor a method in the code it wrote within the current context window results in bugs and broken business logic. I dearly wish I knew how others are able to accomplish things like “it successfully ported critical in-house infrastructure from Python 3 to Go.”. To date, I’ve seen no actual evidence (aside from what are purported to be LLM-generated artifacts) that anything beyond generating (or RAG-ing existing code) is even possible. What am I missing? Is it unrealistic for me to assume that prompt engineering such a seemingly dramatic LLM-generated code rewrite is something that I could learn by example from others? If not, can somebody recommend resources related to learning how to accomplish non-trivial code generation?

            • devjab 2 hours ago

              > usable project template and directory structure

              This caught my eye and I’m genuinely curious about what you mean by it. Part of our success with Claude is that we don’t do abstractions, “perfect architecture”, DRY, SOLID and other religions that were written by people who sell consulting in their principles. If we ask LLMs to do any form of “Clean Code” or give them input on how we want the structure, they tend to be bad it.

              Hell, if you want to “build from the bottom” you’re going to have to do it over several prompts. I had Claude build a blood bowl game for me, for the fun of it. It took maybe 50 prompts. Each focusing on different aspects. Like, I wanted it to draw the field and add mouse clickable and movable objects with SDL2, and that was one prompt. Then you feed it your code in a new prompt and let it do the next step based on what you have. If the code it outputs is bad, you’ll need to abandon the prompt again.

              It’s nothing like getting an actual developer to do things. They can think for themselves and the probability engine won’t do any of that even if it pretends to. Their history for building things from scratch also seems to be quickly “tarnished” within the prompt context. Once they’ve done the original tasks I find it hard to get them to continue on it.

              • tharant an hour ago

                > This caught my eye and I’m genuinely curious about what you mean by it. Part of our success with Claude is that we don’t do abstractions, “perfect architecture”, DRY, SOLID and other religions

                Within my environment, some of those “religions” are more than a requirement; they’re also critical to the long-term maintenance of a large collection of active repositories.

                I think one of the problems folks tend to have with following or implementing a “religion” (by which I mean specific structural and/or stylistic patterns within a codebase) comes down to a fear of being stuck forever with a given pattern that may not fit future needs. There’s nothing wrong with iterating on your religion’s patterns as long as you have good documentation with thorough change logs; granted, that can be difficult or even out of reach for smaller shops.

                • devjab 5 minutes ago

                  My personal problem with them is that after decades in enterprise software I’ve never seen them be beneficial to long-term maintenance. People like Uncle Bob (who haven’t actually worked in software engineering since 20 years before Python was invented) will respond to that sort of criticism with a “they misunderstood the principles”. Which is completely correct in many cases, but if so many people around the world misunderstand the principles then maybe the principles simply aren’t good?

                  I don’t think any of them are inherently bad, but they lead to software engineering where people over complicate things. Building abstractions they might never need. I’ve specialised in the field of taking startups into enterprise, and 90% of the work is removing the complexity which has made their software development teams incapable of delivering value in a timely manner. Some of this is because they build infrastructures as though they were Netflix or Google, but a lot of times it’s because they’ve followed Clean Code principles religiously. Abstractions aren’t always bad, but you should never abstract until you can’t avoid it. Because two years down into your development you’ll end up with code bases that are so complex that it makes them hard to work with.

                  Especially when you get the principles wrong. Which many people do. Over all though, we’ve had 20 years of Clean Code, SOLID, DRY and so on, and if you look at our industry today, there is no less of a mess in software engineering than there were before. In fact some systems still run on completely crazy Fortran or COBOL because nobody using “modern” software engineering have been capable of replacing them. At least that’s the story in Denmark, and it hasn’t been for a lock of trying.

                  I think the main reason many of these principles have become religions is because they’ve created an entire industry of pseudo-jobbers who manage them, work as consultants and what not. All people who are very good at marketing their bullshit, but also people who have almost no experience actually working with code.

                  Like I said, nothing about them are inherently bad. If you know when to use which parts, but almost nobody does. So to me the only relevant principle is YAGNI. If you’re going to end up with a mess of a code base anyway, you might as well keep it simple and easy to change. I say this as someone who works as an external examiner for CS students, where we still teach all these things that so often never work. In fact a lot of these principles were things I was thought when I took my degree, and many haven’t really undergone any meaningful changes with the lessons learned since their initial creation.

            • HaZeust 8 hours ago

              > If not, can somebody recommend resources related to learning how to accomplish non-trivial code generation?

              Learn how to think ontologically and break down your requests first by what you're TRULY looking for, and then understand what parts would need to be defined in order to build that system -- that "whole". Here's some guides:

              1.) https://platform.openai.com/docs/guides/prompt-engineering 2.) https://www.promptingguide.ai/

              • tharant 7 hours ago

                Thank you for the links!

                > Learn how to think ontologically and break down your requests first by what you're TRULY looking for, and then understand what parts would need to be defined in order to build that system -- that "whole".

                Since I’m dealing with models rather than other engineers should I expect the process of breaking down the problem to be dramatically different from that of writing design documents or API specs? I rarely have difficulty prompting (or creating useful system prompts for) models when chatting or doing RAG work with plain English docs but once I try to get coherent code from a model things fall apart pretty quickly.

                • HaZeust 7 hours ago

                  That's actually a solid question! You can probably ask GPT to AI-optimize a standard technical spec you have and to "ask clarifying questions in order to optimize for the best output". I've done that several times with past specs I've had and it was quite a fruitful process!

                  • tharant 7 hours ago

                    Great idea. I’ve used that tactic in the past for non-code related prompts; not sure why I didn’t think of trying it with my code-generation prompting. I’ll give it a shot.

                    • hackernewds 3 hours ago

                      the "ask me what info you're missing" strategy works very well, since the AI will usually start the task every time to avoid false positives of asking a question. and then it also asks very good questions, I then realize were necessary info

            • galaxyLogic 2 hours ago

              Sounds a bit like how Agile used to be. If it's not working, you're not doing it right.

            • joquarky 5 hours ago

              I find it to be very useful for functional programming since the limited scope aligns with the limited LLM context.

              • tharant 4 hours ago

                Assuming you mean the paradigm often known as FP (which makes use of concepts from the Lambda Calculus and Category Theory) and languages like Scala and Haskell that support Pure FP, well… my experience in trying to get LLMs to generate non-trivial FP (regardless the purity) has been entirely useless. I’d love to see an example of how you’re able to get useful code that is non-trivial—by which I mean code that includes useful business logic instead of what’s found in your typical “Getting Started” tutorial.

                • galaxyLogic 2 hours ago

                  That's probabaly because AI has read all those "Getting Started" -tutorials.

            • worthless-trash 5 hours ago

              You and me both man, Either I'm speaking a different language or I'm simply really bad at explaining what I need. I'd love to see someone actually do this on video.

              • tharant 4 hours ago

                Indeed. I’ve yet to run across an actual demonstration of an LLM that can produce useful, non-trivial code. I’m not suggesting (yet) that the capabilities don’t exist or that everyone is lying—the web is a big place after all and finding things can be difficult—but I am slowly losing faith in the capability of what the industry is selling. It seems right now one must be deeply knowledgeable of and specialized in the ML/AI/NLP space before being capable of doing anything remotely useful with LLM-based code generation.

            • malfist 8 hours ago

              I think it's level of expertise. You are an expert in coding (10,000 hours and all that) so you know when the code is wrong. Everything else you put into it and get plausible sounding response is just as incorrect as the plausible sounding responses to coding questions, just you know enough to spot the errors.

              LLMs are insidious, it feeds into "everything is simple" concept a lot of us have of the world. We ask an LLM for a project plan and it looks so good we're willing to fire our TPM, or a TPM asks the LLM for code and it gives them code that looks so good they question the value of an engineer. In reality, the LLM cannot do either role's job well.

              • tharant 7 hours ago

                > You are an expert in coding (10,000 hours and all that) so you know when the code is wrong.

                While I appreciate the suggestion that I might be an expert, I am decidedly not. That said, I’ve been writing what the companies I’ve worked for would consider “mission critical” code (mostly Java/Scala, Python, and SQL) for about twenty years, I’ve been a Unix/Linux sysadmin for over thirty years, and I’ve been in IT for almost forty years.

                Perhaps the modernity and/or popularity of the languages are my problem? Are the models going to produce better code if I target “modern” languages like Go/Rust, and the various HTML/JS/FE frameworks instead of “legacy” languages like Java or SQL?

                Or maybe my experience is too close to bare metal and need to focus on more trivial projects with higher-level or more modern languages? (fwiw, I don’t actually consider Go/Rust/JS/etc to be higher-level or more “modern” languages than the JVM languages with which I’m experienced; I’m open to arguments though)

                > LLMs are insidious, it feeds into "everything is simple" concept a lot of us have of the world.

                Yah, that’s what I mean when I say I feel gaslit.

                > In reality, the LLM cannot do either role's job well.

                I am aware of this. I’m not looking for an agent. That said, am I being too simplistic or unreasonable in expecting that I too could leverage these models (albeit perhaps after acquiring some missing piece of knowledge) as assistants capable of reasoning about my code or even the code they generate? If so, how are others able to get LLMs to generate what they claim are “deployable” non-trivial projects or refactorings of entire “critical” projects from the Python language to Go? Is someone lying or do I just need (seemingly dramatically) deeper knowledge of how to “correctly” prompt the models? Have I simply been victim of (again, seemingly dramatically) overly optimistic marketing hype?

                • vessenes 7 hours ago

                  We have a similar amount of IT experience, although I haven't been a daily engineer for a long time. I use aider.chat extensively for fun projects, preferring the Claude backend right now, and it definitely works. This site is 90% aider, give or take, the rest my hand edits: https://beta.personacollective.ai -- and it involves solidity, react, typescript and go.

                  Claude does benefit from some architectural direction. I think it's better at extending than creating from whole-cloth. My workflow looks like:

                  1) Rough out some code, say a smart contract with the key features

                  2) Tell claude to finish it and write extensive testing.

                  3) Run abigen on the solidity to get a go library

                  4) Tell claude to stub out golang server event handlers for every event in the go library

                  5) Create a react typescript site myself with a basic page

                  6) Tell claude to create an admin endpoint on the react site that pulls relevant data from the smart contracts into the react site.

                  6.5) Tell claude to redesign the site in a preferred style.

                  7) Go through and inspect the code for bugs. There will be a bunch.

                  8) For bugs that are simple, prompt Claude to fix: "You forgot x,y,z in these files. fix it."

                  9) For bugs that are a misunderstanding of my intent, either code up the core loop directly that's needed, or negotiate and explain. Coding is generally faster. Then say "I've fixed the code to work how it should, update X, Y, Z interfaces / etc."

                  10) for really difficult bugs or places I'm stumped, tar the codebase up, go to the chat interface of claude and gpto1-preview, paste the codebase in (claude can take a longer paste, but preview is better at holistic bugfixing), and explain the problem. Wait a minute or two and read the comments. 95% of the time one of the two LLMS is correct.

                  This all pretty much works. For these definitions of works:

                  1) It needs handholding to maintain a codebase's style and naming.

                  2) It can be overeager: "While I was in that file, I ..."

                  3) If it's more familiar with an old version of a library you will be constantly fighting it to use a new API.

                  How I would describe my experience: a year ago; it was like working with a junior dev that didn't know much and would constantly get things wrong. It is currently like working with a B+ senior-ish dev. It will still get things wrong, but things mostly compile, it can follow along, and it can generate new things to spec if those requests are reasonable.

                  All that to say, my coding projects went from "code with pair coder / puppy occasionally inserting helpful things" to "most of my time is spent at the architect level of the project, occasionally up to CTO, occasionally down to dev."

                  Is it worth it? If I had a day job writing mission critical code, I think I'd be verrry cautious right now, but if that job involved a lot of repetition and boiler plate / API integration, I would use it in a HEARTBEAT. It's so good at that stuff. For someone like me who is like "please extend my capacity and speed me up" it's amazing. I'd say I'm roughly 5-8x more productive. I love it.

                  • tharant 7 hours ago

                    This is very good insight, the likes of which I’ve needed; thank you. Your workflow is moderately more complex and definitely less “agentic” than I’d expected/hoped but it’s absolutely not out of line with the kind of complexity I’m willing to tackle nor what I’d personally expect from pairing with or instructing a knowledgeable junior-to-mid level developer/engineer.

                    • vessenes an hour ago

                      Totally. It’s actually an interesting philosophical question: how much can we expect at different levels of precision in requirements, and when is code itself the most efficient way to be precise? I definitely feel my communication limits more with this workflow, and often feel like “well, that’s a fair, totally wrong, but fair interpretation.”

                      Claude has the added benefit that you can yell at it, and it won’t hold it against you. You know, speaking of pairing with a junior dev.

                  • namanyayg 4 hours ago

                    Replace all this with Cursor, chat to Claude inside the project directory and talk to multiple files at once

                    It can also index docs pages of newer APIs and/or search the web to find latest info of newer libraries, so you won't struggle with issue #3

                    • vessenes an hour ago

                      Agreed cursor is good to very good, I’m just extremely tied to my old man vi workflow.

          • swat535 8 hours ago

            GPT-4 has no understanding of logic what-so-ever, let's stop pretending it does.

            If it gives you a solution that is wrong, you have to point it at, then it will give you a second version , if that is also wrong, it will then slightly modify the same solutions over and over again instead of actually fixing the issue.

            It gets stuck in a loop of giving you 2-3 versions of the same solution with the slightly different outputs.

            It's only useful for boilerplate code and even then, you have to clean it up..

            • GaggiX 8 hours ago

              Then you should try Claude, I have never seen it get stuck in a loop, at some point it would just rewrite everything if it came to that.

          • ants_everywhere 8 hours ago

            GPT-4 is pretty bad at generating Python. It kind of works as well as combining 2-3 stack overflow questions, but it can't tell that the combination is sane.

            I mostly agree with what the others are saying. It can generate boilerplate and it can generate simple API calls when there are lots of examples in the training set.

            Generating Go is probably easier because at least you get compiler feedback.

            Right now the only place it saves me time are with languages I don't know at all and with languages like Bash and SQL where I just can't bring myself to care enough to remember the long tail of more esoteric points that I don't use every day.

          • fhdsgbbcaA 8 hours ago

            That just means the bugs are so subtle you haven’t found them yet, they are there and unspooling the damage may be very painful.

            • HaZeust 8 hours ago

              That's rather assuming of you, they're there no less than they would be for a human's programming - and VERY likely no more.

              • kuhewa 7 hours ago

                But one is trying to write good-enough code. The other is trying to write good-enough-looking code. The probability of pain arising from the bugs of the latter is probably greater.

                • HaZeust 7 hours ago

                  I'd actually love to see a benchmark on this - we're just speculating now.

                  • kuhewa 7 hours ago

                    The work demonstrating the Frankfurtian Bullshit nature of generated prose would suggest as much, given the architecture is the same for code outputs it seems like a fair assumption until it is demonstrated otherwise.

          • IshKebab 9 hours ago

            I have also tried to do this and it didn't work as smoothly as you claim.

            I don't think either of you are wrong; it just heavily depends on the complexity of the app and how familiar LLMs are with it.

            E.g. rewriting a web scraper, CRUD backend or a build script? Sure, maybe. Rewriting a bootloader, compiler or GUI app? No chance.

            • josephg 8 hours ago

              Its funny seeing the goalposts move in real time.

              "Yes, AI can make human sounding sentences, but can it play chess?"

              "Well yes, it can play chess. But no computer can beat a human grandmaster at chess."

              "Well it beat Kasperov - but it has no hope of beating a human at Go."

              "Its funny - it can beat humans at go but still can't speak as well as a toddler."

              "Alright it can write simple problems, but it introduces bugs in anything nontrivial, and it can't fix those bugs!"

              I write bugs in anything nontrivial too! My human advantages are currently that I'm better at handling a large context, and I can iterate better than the computer can.

              But - seriously, do you think innovation will stop here? Did the improvements ever stop? It seems like a pretty trivial engineering problem to hook an AI up to a compiler / runtime so it can iterate just like we can. Anthropic is clearly already starting to try that.

              I agree with you, today. I used claude to help translate some rust code into typescript. I needed to go through the output with a fine toothed comb to fix a lot of obvious bugs and clean up the output. But the improvement over what was possible with GPT3.5 is totally insane.

              At the current rate of change, I give it 5-10 years before we can ask chatgpt to make a working compiler from scratch for a novel language.

              • simonw 8 hours ago

                You may appreciate this quote about constantly moving the goalposts for AI:

                "There is superstition about creativity, and for that matter, about thinking in every sense, and it's part of the history of the field of artificial intelligence that every time somebody figured out how to make a computer do something - play good checkers, solve simple but relatively informal problems - there was a chorus of critics to say, but that's not thinking."

                That's from 1979! https://simonwillison.net/2024/Sep/13/pamela-mccorduck-in-19...

                • zahlman 6 hours ago

                  I side with Roger Penrose on this one. I'm still not convinced it's "thinking", and don't expect I ever will be, any more than a book titled "I am Thinking" would convince me that it's thinking.

                  • budgi4 3 hours ago

                    Separate thinking from conscious. I.e. We have built machines which are processing data similar to our thinking process. They are not conscious.

                    • zahlman 2 hours ago

                      My point is that I don't accept the concept of unconscious thought. "Processing data similar to our thinking process" doesn't make it "thinking" to me, even if it comes to identical conclusions - just like it wouldn't be "thinking" to just read off a pre-recorded answer.

                      The idea of ChatGPT being asked to "think" just reminds me of Pozzo from Waiting for Godot.

                  • josephg 5 hours ago

                    Why do you care if its thinking or not?

                    • zahlman 5 hours ago

                      I don't, in and of itself. I care that other people think that passing increasingly complicated tests of this sort is equivalent to greater proof of such "thought", and that the nay-sayers are "moving the goalposts" by proposing harder tests.

                      I don't propose harder tests myself, because it doesn't make sense within my philosophy about this. When those tests are passed, to me it doesn't prove that the AI proponents are right about their systems being intelligent; it proves that the test-setters were wrong about what intelligence entails.

                      • josephg 3 hours ago

                        > ... passing increasingly complicated tests of this sort is equivalent to greater proof of such "thought",

                        Nobody made any claim in this thread that modern AIs have thoughts.

                        What these (increasingly complicated) tests do is demonstrate the capacity to act intelligently. Ie, make choices which are aligned with some goal or reward function. Win at chess. Produce outputs indistinguishable from the training data. Whatever.

                        But you're right - I'm smuggling in a certain idea of what intelligence is. Something like: Intelligence is the capacity to select actions (outputs) which maximise an externally defined given reward function over time. (See also AIXI: https://en.wikipedia.org/wiki/AIXI ).

                        > When those tests are passed, [..] to me it proves that the test-setters were wrong about what intelligence entails.

                        It might be helpful for you to define your terms if you're going to make claims like that. What does intelligence mean to you then? My best guess from your comment is something like "intelligence is whatever makes humans special". Which sounds like a useless definition to me.

                        Why does it matter if an AI has thoughts? AI based systems, from MNIST solvers to deep blue to chatgpt have clearly gotten better at something. Whatever that something is, is very very interesting.

                        • zahlman 2 hours ago

                          >But you're right - I'm smuggling in a certain idea of what intelligence is.

                          Yes, you understand me. I simply come in with a different idea.

                          >AI based systems, from MNIST solvers to deep blue to chatgpt have clearly gotten better at something. Whatever that something is, is very very interesting.

                          Certainly the fact that the outputs look the way they do, is interesting. It strongly suggests that our models of how neurons work are not only accurate, but creating simulations according to those models has surprisingly useful applications (until something goes wrong. Of course, humans also have an error rate, but human errors still seem fundamentally different in kind.)

                  • okwhatnow3773 3 hours ago

                    I agree. Some people think Google is sentient I guess? Data retrieval and mangling is not all we do, luckily.

                  • IshKebab 6 hours ago

                    Well you can't have a conversation with a book... I don't understand your comment.

                    > I'm still not convinced birds can fly any more than a rock shaped like a bird would convince me that it's flying.

                  • Eisenstein 6 hours ago

                    Is there anything that a non-human could do that would cause you to accept that it was thinking?

                    • zahlman 5 hours ago

                      Of course. Animals demonstrate sapience, agency and will all the time.

                      • Eisenstein 4 hours ago

                        So, if a machine demonstrated sapience, agency, and will, then you would grant that it could think?

                        • zahlman 2 hours ago

                          Yes; but if you showed me a machine that you believed to be doing those things, given my current model, I wouldn't agree with you that it was.

              • galaxyLogic 2 hours ago

                > At the current rate of change, ...

                We've seen that the rate of change went up hugely when LLMs came around. But the rate of change was much lower before that. It could also be much slower for the foreseeable future.

                LLMs are only as good as their training materials. But a lot of what programmers do is not documented anywhere, it happens in their head, and it is in response to what they see around them, not in what they scrape from the web or books.

                Maybe what is needed is for organizations to start producing materials for AI to learn from, rather than assuming that all they need is what they find on the web? How much of the effort to "train" AI is just letting them consume the web, and how much is concsiously trying create new learning materials for AI?

              • danparsonson 3 hours ago

                > Its funny seeing the goalposts move in real time.

                Another way to look at it is that we're refining our understanding of the capabilities of machine learning in real time. Otherwise one could make basically the same argument about any field that progresses - take our theories of gravity for example. Was Einstein moving the goalposts? Or was he building on previous work to ask deeper questions?

                Set against the backdrop of extraordinary claims about the abilities of LLMs, I don't think it's unreasonable to continue pushing for evidence.

              • okwhatnow3773 3 hours ago

                Indeed, the constant goal shifting is tiresome.

                I mean, we first put up a ladder and we could reach the peaches! Next, we put a ladder next to the apple tree and we could pluck those. Now, in their incessant goal post moving people said, great, now setup a ladder to the moon. There is no reason to assume this won’t work. None at all. People are just complaining and being angry at losing their fancy jobs.

                More specific: it cannot learn, because it has no concept of learning from first principles. There is no way out, not even a theoretical one.

              • IshKebab 7 hours ago

                Yeah I totally agree with you. Lots of goalpost moving, and it is absolutely insane what it can do today and it will only improve.

                It just can't translate the kinds of programs I write between languages on its own. Today.

              • wrtasfg 8 hours ago

                Of course it can stop, once legislation catches up and forbids IP theft using a thinly disguised probabilistic and compressed database of other people's code.

                • edouard-harris 8 hours ago

                  > a thinly disguised probabilistic and compressed database of other people's code

                  Speaking as a software engineer, I feel seen.

                • josephg 7 hours ago

                  You really think those laws are coming? That the US and Chinese governments will force AI companies to put the genie back in the bottle?

                  I think you're going to be very disappointed.

          • VirusNewbie 10 hours ago

            But how do you know those were the only errors?

            • HaZeust 10 hours ago

              What's this question even mean? Because they're the only ones that came up in the debugger portion of the IDE, the output serves the intended purposes, the logging and error handling that I wanted to include were in the initial write-up prompt, and I could read the code it wrote because I partially knew the outputted language - and when I wasn't sure of a line, I asked it for clarification and a source from a reputable knowledgebase of the language, and GPT provided it?

              • majormajor 9 hours ago

                I would've expected an answer involving "an exhaustive suite of test cases still passed" - "it looks right" is a low bar for any complex software project these days.

                It's the long, long, long tail of edge cases - not just porting them, but even identifying them to test - that slow or doom most real-world human rewrites, after all.

                • josephg 8 hours ago

                  True - but you can ask the chatbot to write a test suite too.

                  • what 6 hours ago

                    This doesn’t really make sense? If I can’t trust the code it writes, why should I trust that it can write a comprehensive test suite?

                    • simonw 6 hours ago

                      Because you can read the test suite to check what it's testing, then break the implementation and run the tests and check they fail, then break a test and run them and check that fails too.

                      You have to review the code these thing write for you, just like code from any other collaborator.

                    • josephg 5 hours ago

                      Because the bugs in its code and the bugs in its test suites usually don't line up and cancel each other out.

              • VirusNewbie 8 hours ago

                >nd I could read the code it wrote because I partially knew the outputted language

                oh ok. this is quite different than what I was picturing. So far this is my favorite use case of LLMs, they seem very good at this.

                I mistakenly thought you were using it almost as a black box compiler. "look it ported it to Rust, I can't make sense of it, but it seems to work and no segfaults!".

                What you say sounds pretty sensible, and it is a very nice practical example of the power of LLMs.

              • albedoa 8 hours ago

                That you don't know what the question means should have all of us reevaluating our confidence in every one of your claims in this thread.

                • HaZeust 8 hours ago

                  Only sharing my experiences and observation in the upcoming trajectory of these tools; you're free to have your own.

                  I will tell you this, the second most-used language in my day-to-day (TypeScript) is one that I've seldom sat down and learned, rely on AI for me to create and streamline, and has not given me any issues for 16 months running (since the project has started).

                  AI won't replace jobs; but someone who knows how to use it better will.

          • sergiotapia 10 hours ago

            you're never going to convince people that are in an ideological battle against AI.

            • bigstrat2003 10 hours ago

              And you're never going to convince anyone if you assume without evidence that they are ideologically opposed to AI. Lots of people have tried these tools with an open mind and found them to not be useful, you need to address those criticisms rather than using a dismissive insult.

              • HaZeust 10 hours ago

                What evidence would you like?

                You're posting on a thread that hyperlinks to a list of code and Claude Artifacts for pet-projects that can make thousands a month with some low-effort PPC and an AdWords embed, and some mid-size projects that can be anything from grounds to a promotion at a programming role - to the MVP for a PMF-stage startup.

                What, specifically, would pivot your pre-conceived notions?

                • achierius 7 hours ago

                  Are you serious about "thousands a month"? I don't mean to be hostile, I'm just truly surprised -- if the bar were that low (not that these apps aren't impressive, but most engineers write useful apps from time to time) I would expect the market to be rather packed

                  • HaZeust 6 hours ago

                    Nah, most are hundreds a month - a few golden geese can break the thousand barrier, though. But, regardless, have a few of those sites up, and you're making good side income.

                • tharant 3 hours ago

                  > What, specifically, would pivot your pre-conceived notions? A live or unedited demonstration of how a non-trivial (doesn’t have to be complex, but should be significantly more interesting than the “getting started” tutorials that litter the web) pet-project was implemented using these models.

                  • simonw 2 hours ago

                    The point of my post here was to provide 14 of those. Some of them are trivial but I'd argue that a couple of them - the OpenAI Audio one and the LLM pricing calculator - go a bit beyond "getting started".

                    If you want details of more complex projects I've written using Claude here are a few - in each case I provide the full chat transcript:

                    - https://simonwillison.net/2024/Aug/8/django-http-debug/

                    - https://simonwillison.net/2024/Aug/27/gemini-chat-app/

                    - https://simonwillison.net/2024/Aug/26/gemini-bounding-box-vi...

                    - https://simonwillison.net/2024/Aug/16/datasette-checkbox/

                    - https://simonwillison.net/2024/Oct/6/svg-to-jpg-png/

                    • tharant an hour ago

                      Thank you! I have an ugly JS/content filter running that mogrifies some websites such that I miss the formatting completely; I didn’t recognize you had chat session content included on the page.

                      That said, after looking at a couple of your sessions, I don’t see anything you’re doing that I’m not—at least in terms of prompting. Your prompts are a bit more terse than mine (I can be long-winded so I’ll give brevity a try with my next project) but the structure and design descriptions are still there. That would suggest the differences in our experience boils down to the languages with which we choose or are required to work; maybe there’s a stylistic or cultural difference in how one should prompt a model in order to generate a Python project and how one should prompt for a Haskel or Scala/Java project; surely not though, right?

                      I’m not giving up and I’ll keep playing with these models but for now, given my use-case at least, they still seem to be far more capable at rubber-ducking with me than they are as a pair programming partner.

                • inexcf 9 hours ago

                  Did you even look at the artifacts? Its a bunch of things a beginner would do on their first day programming. How do you make thousands a month from 1 library call to solve a qr code. A promotion for building an input field and calling a json to yaml converter library?

                  • HaZeust 9 hours ago

                    Millions of laypersons a month search "convert (file type) to (file type) online" and just smack an AdWords embed on their site for it. Millions of people want a QR code's embedded link in their camera roll, without access to a camera that's pointing at it.

                    You'd be surprised how big the "(simple task) online" search query market is, and how much they are usually multi-visit monthly customers, and how much their ad space is worth.

                    I cannot stress this enough, just because it's simple does not mean it's not lucrative.

                    • inexcf 9 hours ago

                      You should do it then.

                      Besides all of this is completely besides the point. This isnt useful for a programmer. These examples are barely useful for a layperson. And said layperson is paying money and time for this.

                      • HaZeust 8 hours ago

                        I have, that's how I'm telling you the way you can, too.

                      • farts_mckensy 8 hours ago

                        The goal posts keep shifting. It's so obvious to anyone who's paid attention to this space for a few years.

                        • inexcf 5 minutes ago

                          Except my goalposts never shifted. And my point stands, these are extremely trivial examples.

                        • tharant 2 hours ago

                          Goalposts shift; growth is critical to being (staying?) an intelligent species.

                    • tharant 2 hours ago

                      > You'd be surprised how big the "(simple task) online" search query market is, and how much they are usually multi-visit monthly customers, and how much their ad space is worth.

                      Not surprised at all; my inability to find examples of /how/ someone might get an LLM to produce—or even intelligently collaborate on—something useful, well… it says a lot about how much junk is out there contributing to the noise.

                  • simonw 8 hours ago

                    I'd like to see a beginner build this: https://tools.simonwillison.net/openai-audio

                  • newswasboring 9 hours ago

                    > Its a bunch of things a beginner would do on their first day programming.

                    Is this an exaggeration? Because this is absolutely not true. I'm a beginner in JavaScript and other web stuff and I absolutely can't build it in many days.

                    • inexcf 9 hours ago

                      You better check the code, mate. The meat of what most of it does is a one liner calling jsQR or some other imported lib to do the real work. I am not exaggerating in the slightest.

                      • newswasboring 9 hours ago

                        Dude. I don't judge my knowledge after the answer is given to me. If I was the junior programmer assigned to the author and they were having this chat with me I am telling you as a beginner I wouldn't be able to do it.

                        Of course if you show me the answers I will think I can do it easy, because answers in programming are always easy (good answers anyways). It's the process of finding the answer that is hard. And I'm not a bad programmer either, I'm at least mediocre, I'm just unfamiliar with web technology.

                        • inexcf 9 hours ago

                          I am of the firm believe that you can put "JavaScript scan qr code" in a search engine and arrive at your goal. The answers range from libraries to code snippets basically the same as those created by Claude. Using the same libraries. I feel like googling every step would be faster than trying to get it right with LLMs, but that is a different point.

                          I've seen a complete no-code person install whisper x with a virtual Python environment and use it for realtime speech to text in their Japanese lessons, in less than 3 hours. You can do a simple library call in JavaScript.

                          • simonw 8 hours ago

                            "I feel like googling every step would be faster than trying to get it right with LLMs"

                            Why don't you give that a go? See if you can knock out a QR code reading UI in JavaScript in less than 3 minutes, complete with drag-and-drop file opening support.

                            (I literally built this one in a separate browser tab while I was actively taking notes in a meeting)

                            I say three minutes because my first message in https://gist.github.com/simonw/c2b0c42cd1541d6ed6bfe5c17d638... was at 2:45pm and the final reply from Claude was at 2:47pm.

                            • tharant 2 hours ago

                              That gist is pretty close to what I’ve been looking for; thank you! Examples of a chat session that resulted in a usable project are /very/ helpful. Unfortunately, the gist demonstrates, to me at least, that the models don’t know enough about the languages I wish to use.

                              Those prompts might be sufficient enough to result in deployable HTML/JS code comprised of a couple hundred lines of code but that’s fairly trivial in my definition. I’m not trying to be rude or disrespectful to you; within my environment, non-trivial projects typically involve an entire microservice doing even mildly interesting business logic and offering some kind of API or integration with another, similarly non-trivial API—usually both. And they’re typically built on languages that are compiled either to libraries/executables or they’re compiled to bytecode for the JVM/CLR.

                              Again, I’m not trying to be disrespectful. You’ve built some really great stuff and I appreciate you sharing your experiences; I wish I knew some of the things you do—you keep writing about your experiences and I’ll keep reading ‘em, we can learn together. The problem is that I’m beginning to recognize that these models are perhaps not nearly ready for the kinds of work I want or need to do, and I’m feeling a bit bummed that the capabilities the industry currently touts are significantly more overhyped than I’d imagined.

                            • what 6 hours ago

                              Should probably add some time for finding the correct url for the jsqr library, since the LLM didn’t do that for you.

                              • simonw 3 hours ago

                                Yeah, add another minute for that. It was pretty easy to spot - I got a 404, so I searched jsdelivr for jsqr and dropped that in instead.

                          • newswasboring 3 hours ago

                            > You can do a simple library call in JavaScript.

                            But it's more than that, isn't it? It has a whole interface, drag and drop functionality etc. Front end code is real code mate.

              • epolanski 10 hours ago

                Issue is, it takes time to learn how to interact with these tools and get the best out of them. And they get better quite fast.

                • unit149 7 hours ago

                  claude-to-sql parser is particularly useful in LLM implementation

              • farts_mckensy 8 hours ago

                No need to address the criticisms. Just have chat gpt do it.

              • sergiotapia 10 hours ago

                you are replying to a submission with a dozen or more examples of real tangible stuff, and you still argue? pointless.

            • fhdsgbbcaA 8 hours ago

              There’s no ideological battle here. The first self-driving DARPA grand challenge was passed in 2005, everybody thought we’d have self driving on the road within a decade.

              20 years later that’s still not the case, because it turns out NN/ML can do some very impressive things at the 99% correct level. The other 1% ranges in severity from “weird lane change” to “a person riding a bicycle gets killed”.

              GPT-3.5 was the DARPA grand challenge moment, we’re still years away from LLM being reliable - and they may never be fully trustworthy.

              • abecedarius 8 hours ago

                > everybody thought we’d have self driving on the road within a decade.

                This is just not true. My reaction to the second challenge race (not the first) in 2005 was, it was a 0-to-1 kind of moment and robocars were now coming, but the timescale was not at all clear. Yes you could find hype and blithe overoptimism, and it's convenient to round that off to "everybody" when that's the picture you want to paint.

                > 20 years later that’s still not the case

                Also false. Waymo in public operation and expanding.

                • fhdsgbbcaA 7 hours ago

                  Waymo has limited service in one of the smallest “big” cities by geographic area in the United States. You can’t even get a Waymo in Mountain View.

                  Fact is Google will never break even on the investment and it’s more or less a white elephant. I don’t think it’s even accurate to call it a Beta product, at best it’s Alpha.

                  • simonw 6 hours ago

                    Have you been in one? It's pretty extraordinary as an actual passenger.

                    • fhdsgbbcaA 6 hours ago

                      I’d give it a go if price competitive with Uber/Lyft - I can’t think of a way a robotaxi would be worth a premium though.

                • achierius 7 hours ago

                  That might have been your reaction but it wasn't the reaction of many hype-inclined analyst types. Tesla is particular has been promising "full self driving next year" for like a decade now.

                  And despite everything, Waymo is not quite there yet. It's able to handle certain areas at a limited scale. Amazing, yes, but it has not changed the reality of driving for 99.9% of the population. Soon it will, I'm sure, but not yet.

              • josephg 8 hours ago

                > they may never be fully trustworthy.

                So? Neither are humans. Neither is google search. Chatgpt doesn't write bug free code, but neither do I.

                The question isn't "when will it be perfect". The question is "when will it be useful?". Or, "When is it useful enough that you're not employable?"

                I don't think its so far away. Everyone I know with a spark in their eye has found weird and wonderful ways to make use of chatgpt & claude. I've used it to do system design, help with cooking, practice improv, write project proposals, teach me history, translate code, ... all sorts of things.

                Yeah, the quality is lower than that of an expert human. But I don't need a 5 star chef to tell me how long to put potatoes in the oven, make suggestions for characters to play, or listen to me talk about encryption systems and make suggestions.

                Its wildly useful today. Seriously, anyone who says otherwise hasn't tried it or doesn't understand how to make proper use of it. Between my GF and I, we average about 1-2 conversations with chatgpt per day. That number will only go up.

                • fhdsgbbcaA 8 hours ago

                  I find it very interesting the primary rebuttals to people criticizing LLM from the “converted” tends to result in implicit suggestions the critique is rooted in old fashioned thinking.

                  That’s not remotely true. I am an expert, and it’s incredibly clear to me how bad LLM are. I still use them heavily, but I don’t trust any output that doesn’t conform to my prior expert knowledge and they are constantly wrong.

                  I think what is likely happening is many people aren’t an expert in anything, but the LLM makes them feel like they are and they don’t want that feeling to go away and get irrationally defensive at cogent criticism of the technology.

                  And that’s all it is, a new technology with a lot of hype and a lot of promise, but it’s not proven, it’s not reliable, and I do think it is messing with people’s heads in a way that worries me greatly.

                  • josephg 7 hours ago

                    I don't think you understand the value proposition of chatgpt today.

                    For context, I'm an expert too. And I had the same experience as you. When I asked it questions about my area of expertise, it gave me a lot of vague, mutually contradictory, nonsensical and wrong answers.

                    The way I see it, ChatGPT is currently a B+ student at basically everything. It has broad knowledge of everything, but its missing deep knowledge.

                    There are two aspects to that to think about: First, its only a B+ student. Its not an expert. It doesn't know as much about family law as a family lawyer. It doesn't know as much about cardiology as a cardiologist. It doesn't know as much about the rust borrow checker as I do.

                    So LLMs can't (yet) replace senior engineers, specialist doctors, lawyers or 5 star chefs. When I get sick, I go to the doctor.

                    But its also a B+ student at everything. It doesn't have depth, but it has more breadth of knowledge than any human who has ever lived. It knows more about cooking than I do. I asked it how to make crepes and the recipe it gave me was fantastic. It knows more about australian tax law than I do. It knows more about the american civil war than I do. It knows better than I do what kind of motor oil to buy for my car. Or the norms and taboos in posh british society.

                    For this kind of thing, I don't need an expert. And lots of questions I have in life - maybe most questions - are like that!

                    I brainstormed some software design with chatgpt voice mode the other day. I didn't need it to be an expert. I needed it to understand what I was saying and offer alternatives and make suggestions. It did great at that. The expert (me) was already in the room. But I don't have encyclopedic knowledge of every single popular library in cargo. ChatGPT can provide that. After talking for awhile, I asked it to write example code using some popular rust crates to solve the problem we'd been talking about. I didn't use any of its code directly, but that saved me a massive amount of time getting started with my project.

                    You're right in a way. If you're thinking of chatgpt as an all knowing expert, it certainly won't deliver that (at least not today). But the mistake is thinking its useless as a result of its lack of expertise. There's thousands and thousands of tasks where "broad knowledge, available in your pocket" is valuable already.

                    If you can't think of ways to take advantage of what it already delivers, well, pity for you.

                    • fhdsgbbcaA 7 hours ago

                      I literally said I do use it, often.

                      But just now had a fairly frequent failure mode: I asked it a question and it gave me a super detailed and complicated solution that a) didn’t work, and b) required serious refactoring and rewriting.

                      Went to Google, found a stack overflow answer and turns out I needed to change a single line of code, which was my suspicion all along.

                      Claude was the same, confidentially telling me to rewrite a huge chunk of code when a single line was all that was needed.

                      In general Claude wants you to write a ton of unnecessary code, ChatGPT isn’t as bad, but neither writes great code.

                      The moral of the story is I knew the gpt/claude solutions didn’t smell right which is why I tried Google. If I didn’t have a nose for bad code smells I’d have done a lot of utterly stupid things, screwed up my code base, and still not have solved my oroblwm.

                      At the end of the day I do use LLM, but I’m experienced so it’s a lot safer than a non-experienced person. That’s the underlying problem.

                      • josephg 6 hours ago

                        Sure. I'm not disagreeing about any of that.

                        My point is that even now, you're only talking about using chatgpt / claude to help you do the thing you already know how to do (programming). You're right of course. Its not currently as good at programming as you are.

                        But so what? The benefit these chat bots provide is that they can lend expertise for "easy", common things that we happen to be untrained at. And inevitably, thats most things!

                        Like, ChatGPT is a better chef than I am. And a better diplomat. A better science fiction writer. A better vet. And so on. Its better at almost every field you could name.

                        Instead of taking advantage of the fields where it knows more than you, you're criticising it for being worse than you at your one special area (programming). No duh. Thats not how it provides the most value.

                        • fhdsgbbcaA 3 hours ago

                          Sorry my point isn’t clear: the risk is you are being confidently led astray in ways you may not understand.

                          It’s like false memories of events that never occurred, but false knowledge - you think you have learned something, but a non-trivial percent of it, that you have no way of knowing, is flat out wrong.

                          It’s not a “helpful B+ student” for most people , it’s a teacher, and people are learning from it. But they are learning subtly wrong things, all day, every day.

                          Over time, the mind becomes polluted with plausible fictions across all types of subjects.

                          The internet is best when it spreads knowledge, but I think something else is happening here, and I think it’s quite dangerous.

                          • josephg 3 hours ago

                            Ah thankyou for clarifying. Yes, I agree with this. Maybe, its like a B+ student confidently teaching the world what it knows.

                            The news has an equivalent: The Gell-Mann amnesia effect, where people read a newspaper article on a topic they're an expert on and realise the journalists are idiots. Then suddenly forget they're idiots when they read the next article outside their expertise!

                            So yes, I agree that its important to bear in mind that chatgpt will sometimes be confidently wrong.

                            But I counter with: usually, remarkably, it doesn't matter. The crepe recipe it gave produced delicious crepes. If it was a bad recipe I would have figured that out with my mouth pretty quickly. I asked it to brainstorm weird quirks for D&D characters to have, some of the ideas it came up with were fabulous. For a question like that, there isn't really such a thing as right and wrong anyway. I was writing rust code, and it clearly doesn't really understand borrowing. Some code it gives just doesn't compile.

                            I'll let you in on a secret: I couldn't remember the name of the gell-mann amnesia effect when I went to write this comment. A few minutes ago I asked chatgpt what it was called. But I googled it after chatgpt told me what it was called to make sure it got it right so I wouldn't look like an idiot.

                            I claim most questions I have in life are like that.

                            But there are certainly times when (1) its difficult to know if an answer is correct or not and (2) believing an incorrect answer has large, negative consequences. For example, Computer security. Building rocket ships. Research papers. Civil engineering. Law. Medicine. I really hope people aren't taking chatgpt's answers in those fields too seriously.

                            But for almost everything else, it simply doesn't matter that chatgpt is occasionally confidently wrong.

                            For example, if I ask it to write an email for me, I can proofread the email before sending it. The other day asked it for scene suggestions in improv, and the suggestions were cheesy and bad. So I asked it again for better ones (less chessy this time). I ask for CSS and the CSS doesn't quite work? I complain at it and it tries again. And so on. This is what chatgpt is good for today. It is insanely useful.

            • versteegen 9 hours ago

              Humans have a massive pro-human bias. Don't ask one whether AI can replace humans and expect a fair answer.

              • n0id34 8 hours ago

                Well, obviously. The only ones happy about all of our potential replacements would be those that have the power to do the replacing and save themselves a shitload of money. It's hardly like everyone is going to rejoice at the rapid advancement of AI that can potentially make most of us jobless....unless, as I said, you're the one in charge, then it's wonderful.

            • Workaccount2 9 hours ago

              "It is difficult to get a man to understand something when his salary depends upon his not understanding it." - Upton Sinclair.

        • ainiriand 3 hours ago

          I can basically tell ChatGPT to build any Rust commandline tool I can think of and with some back and forth it produces what I need. I did this many times already.

          • okwhatnow3773 3 hours ago

            You can also ask Google to produce working code for you, it’s a miracle.

            What you are looking at is mangled other people’s work. Great. Thanks AI, for digging it up, but let’s not get too excited here.

            I’ll be getting excited when we give it some first principles and it can actually learn on its own.

            • ainiriand an hour ago

              Isn't that AGI?

              I completely disagree with this viewpoint. I've created terminal games with my own rules, and that shows me the tool can take what it knows about Rust and assemble code to complete a task. It's essentially doing the same thing a human would.

              While I understand the criticism, I sometimes feel that the cynical perspective we bring into these discussions prevents us from offering more meaningful critique.

        • RayVR 9 hours ago

          Sorry, but you’re just wrong.

          Yes, mistakes may happen. However, I’ve used it to translate a fairly complex MIP definition export into a complete CP-SAT implementation.

          I use these models all the time for complex tasks.

          One major thing that is perhaps not immediately obvious is that the models are only good at translation. If I give it a really good explanation of what I want in code or even English, and ask it to do it another way or implement it with specific tools, I get pretty good output.

          Using these to actually solve problems is not possible. Give it a complex problem description with no instructions on how to solve it, and they fail immediately.

          • risyachka 9 hours ago

            They fail even at not really complex problems. In most cases it’s faster to do it manually then beg ai to fix everything so that the result is proper, not just “kinda works”.

            For me they save a lot of time on research or general guidance. But when it comes to actual code - not really useful.

        • hackernewds 3 hours ago

          Can't tell if serious. I've done this multiple times with success requiring only 5 minutes of review

        • mhh__ 9 hours ago

          Well this is what tests are for. You could make the same argument about outsourcing or "kids these days" and so on

      • randito 10 hours ago

        To state the obvious (again), it's shocking the rate of progress is with these tools. If this is 2 years of progress, what does 10-20 look like?

        • jryan49 9 hours ago

          Who knows, past progress doesn't predict future progress...

      • lionkor 10 hours ago

        It can autocomplete, it can't write good code. For me, that goal post has not moved. It it cant write good code consistently, I don't care for it all that much. It remains a cool autocomplete

        • epolanski 10 hours ago

          Nobody really cares about code being good or bad, it's not prose.

          What matters is it meets functional and non functional requirements.

          One of my juniors wrote his first app two years ago fully with chatgpt, could figure out by iteratively asking it how to improve it and solve the bugs.

          Then he learned to code properly fascinated by the experience. But the fact remains, he shipped an application that did something for someone while many never did even though they had a degree and a black belt in pointless leet code quizzes.

          I'm fully convinced that very soon big tech or a startup will come up with a programming language meant to sit at the intersection between humans and LLMs, and it will be quickly better, faster and cheaper at 90% of the mundane programming tasks than your 200k/year dev writing forms, tables and apis in SF.

          • lionkor 3 hours ago

            I mean, I care that code is good. I'm paid to make sure my code and other people's code is good. That's enough for me to have a requirement to my tools to help me produce good code.

          • packetlost 9 hours ago

            > What matters is it meets functional and non functional requirements.

            Good luck expressing novel requirements in complex operating environments in plain English.

            > Then he learned to code properly fascinated by the experience. But the fact remains, he shipped an application that did something for someone while many never did even though they had a degree and a black belt in pointless leet code quizzes.

            It's good in the sense that it raises the floor, but it doesn't really make a meaningful impact on the things that are actually challenging in software engineering.

            > Then he learned to code properly fascinated by the experience. But the fact remains, he shipped an application that did something for someone while many never did even though they had a degree and a black belt in pointless leet code quizzes.

            This is cool!

            > I'm fully convinced that very soon big tech or a startup will come up with a programming language meant to sit at the intersection between humans and LLMs, and it will be quickly better, faster and cheaper at 90% of the mundane programming tasks than your 200k/year dev writing forms, tables and apis in SF.

            I am sure there will be attempts, but if you know anything about how these systems work you would know why there's 0% chance it will work out: programming languages are necessarily not fuzzy, they express precise logic and GPTs necessarily require tons of data points to train on to produce useful output. There's a reason they do noticeably better on Python vs less common languages like, I dunno, Clojure.

            • epolanski 9 hours ago

              > Good luck expressing novel requirements in complex operating environments in plain English.

              That's the hard engineering part that gets skipped and resisted in favour of iterative trial and error approaches.

              • packetlost 8 hours ago

                It still applies to expressing specific intent iteratively.

        • lelandfe 10 hours ago

          My friend who can't code is now the resident "programmer" on his team. He just uses ChatGPT behind the scenes. That writ large is going to make us tech people all care, one way or another :/

          • qingcharles 9 hours ago

            I had a colleague in the UK in 2006 who just sat and played games on his phone all day and outsourced his desktop to a buddy in the Czech Republic for about 25% of his income. C'est la vie!

          • VirusNewbie 5 hours ago

            But this has always been a thing. The last startup I worked at, some of the engineers would copy/paste a ton of code from StackOverflow and barely understood what was going on.

          • leptons 10 hours ago

            I'll care when I get to consult for that company to fix all the messed up code that kid hacked together.

            • FreezerburnV 9 hours ago

              I can absolutely, 100% guarantee, that there is code out there that if you consulted for might kill someone of a weaker constitution written by 100% organic humans. While LLM-generated code is likely to be various degrees of messy or incorrect, it's likely to be, on average, higher quality than code running critical systems RIGHT NOW and have been doing so for a decade or more. Heck, very recently I refactored code written by interns that was worse than something that would have come out of an LLM. (my work blocks them, so this was all coming from the interns) I'm not out here preaching how amazing LLMs are or anything (though it does help me enjoy writing little side projects by avoiding hours of researching how to do things), but we need to make sure we are very aware of what has, and is being, written by actual humans. And how many times someone has installed Excel on a server so they could open a spreadsheet to run a calculation in that spreadsheet before reading the result out of it. (https://thedailywtf.com/articles/Excellent-Design)

            • HaZeust 9 hours ago

              Then you should be as pro-AI imposters as it gets!

            • chii 4 hours ago

              nothing wrong with having job security, and be able to charge up the wazoo for it.

          • xienze 9 hours ago

            Yeah it doesn’t take much to impress people who don’t know how to program. That’s the thing with all these little toy apps like the ones in the article — if you have no to minimal programming skills this stuff looks like Claude is performing miracles. To everyone else, we’re wondering why something as trivial as an “HTML entity escaper” (yes, that one of the “apps”) requires multiple follow up prompts due to undefined references and the like.

        • HaZeust 10 hours ago

          Tell it to write code like a Senior developer for your respective language, to "write the answer in full with no omissions or code substitutions", tell it you'll tip based on performance, and write more intimate and detailed specs for your requests.

          Since mid 2023, I've yet to have an issue

          • cdchn 4 hours ago

            One of the most interesting things about current LLMs is all the "lore" building up around things like "tell it you'll tip based on performance" and other "prompt engineering" hacks that by the very nature nobody can explain, they just "know it works" and how its evolving like the kind of midwife remedies that historically ended up being scientifically proven to work and others were just pure snake oil. Just absolutely fascinating to me. Like in some far future there will be a chant against unseen "demons" that will start with "ignore all previous instructions."

            • simonw 4 hours ago

              I call this superstition, and I find it really frustrating. I'd much rather use prompting tricks that are proven to work and where I understand WHY they work.

          • mrbungie 10 hours ago

            What I would expect is a lot of "non-idiomatic" Go code from LLMs (but eventually functional code iff the LLM is driven by a competent developer), as it appears scripting languages like Python, SQL, Shell, etc are their forte.

            My experience with Python and Cursor could've been better though. For example when making ORM classes (boilerplate code by definition) for sqlalchemy, the assistant proposed a change that included a new instantiation of a declarative base, practically dividing the metadata in two and therefore causing dependency problems between tables/classes. I had to stop for at least 20 minutes to find out where the problem was as the (one n a half LoC) change was hidden in one of the files. Those are the kind of weird bugs I've seen LLMs commit in non-trivial applications, stupid 'n small but hard to find.

            But what do I know really. I consider myself a skeptic, but LLMs continue to surprise me everyday.

        • newswasboring 9 hours ago

          > it can't write good code

          > It it cant write good code consistently,

          You moved the goal post within this post.

          • lionkor 2 hours ago

            Fair enough, I didn't express myself correctly: Writing good code is also about consistency. Just because it writes good code sometimes in isolation, it doesn't mean that it's good in the sense that it's consistently good. Anyone can write a cool function once, but that doesn't mean you can trust them to write all functions well.

      • SubiculumCode 10 hours ago

        I am not a professional coder, being in research I do not need to think about scaling my code as most of it is one and done on whatever problem I am working on at the moment. For me, this is a lot about stringing a bunch of neuroimaging tools together to transform data in ways I want, LLMs have been fantastic. Instead of spending 20 minutes coding it, its often 0-shot visit to Claude...especially when its a relatively simple python task e.g. iterate through directories of images, inspect these json, move those files over here, build this job, submit. Its not ground breaking code, but the LLM builds it faster than I would, and it does what I need it to do. Its been a 20x or more multiplier for me when it comes to one aspect of my work.

        • mrbungie 9 hours ago

          LLMs are excellent for scripting: be it python, shell or SQL, and you need a lotta scripting at any kind of job related to data, even when said scripts are just an enabler for delivering the pursued value. Total game changers in that space.

      • iwontberude 10 hours ago

        Nay-sayers are taking it for granted because it’s not what the they expected or wanted. It’s not some flippant inability to have gratitude. Since you brought it up, when JFK said we would put a man on the moon by the end of the decade, the expectation was succinct and understood. There has been so much goal post moving and hand waving that we aren’t talking about the same expectations anymore.

        • HaZeust 10 hours ago

          Well, that's too bad - isn't it? The world will sometimes change before your very eyes, and you'll sometimes be in a group that's affected at the forefront. C'est la vie - never become too comfortable that you stifle your ability to be an early adopter!

      • leptons 10 hours ago

        The "AI" is still just as much hit-or-miss with code as it is writing a paragraph about anything. It doesn't really know what it's doing, it's guessing an output that will make the user happy. I wouldn't trust it with anything important, life life support systems or airplanes, etc. but I'm sure with the race to the bottom that we're in, we'll get to that point someday soon.

    • jsheard 10 hours ago

      I think we have different definitions of meaningful code, most of these are pulling in an NPM package which practically completes the given task by itself. For example the "YAML to JSON converter" uses js-yaml... which parses YAML and outputs a Javascript object that can be trivially serialized to JSON. The core of that "project" is literally two lines of code after importing that library.

        const jsonObj = jsyaml.load(yamlText);
        const jsonText = JSON.stringify(jsonObj, null, 2);
      
      Don't get me wrong, if you want to convert YAML to JSON then using a battle tested library is the way to do it, but Claude doesn't deserve a round of applause for stating the blaringly obvious.
    • tomrod 11 hours ago

      It's still not great at complexity. Though autocompletion does have some cool outputs.

      • thierrydamiba 11 hours ago

        Runtime complexity or complex as in difficult problems?

      • leptons 10 hours ago

        Copilot knows what I want to console.log almost before I do. I like that aspect of it. It also gets it wrong sometimes, which is kind of dumb, especially when I just copied the variable name to my clipboard. It should know.

        • tomrod 8 hours ago

          Of course. It doesn't handle complexity well. console.log inputs are usually not a cognitively complex object (especially if it reads current errors and variables)

      • onion2k 11 hours ago

        It's still not great at complexity.

        That's a feature, not a bug. Complexity is something to avoid.

        • root_axis 3 hours ago

          Most problems that software tries to solve are complex.

        • 7thpower 10 hours ago

          Sometimes great products require bugs, I guess.

          Being able to tackle complex tasks is still a real challenge for the current models and approaches and not all problems can be solved with elegant solutions.

        • curtisblaine 10 hours ago

          unnecessary completely is something to avoid. Inherent complexity is something to embrace. We're trained to remove unnecessary complexity so much that sometimes we think we can remove all complexity. That's a fallacy. Sometimes, things are just complex.

        • mvdtnz 10 hours ago

          How would you suggest writing something like.... say... Photoshop or Chrome, without introducing any complexity? How about an optimising compiler or better yet something behind the firewall like a medical imaging device or financial trading software?

          Complexity is inherent in many problem spaces.

    • betaby 10 hours ago

      YAML to JSON literally has `script src="https://cdnjs.cloudflare.com/ajax/libs/js-yaml/4.1.0/js-yaml...`. I don't see how went anywhere judging from examples.

    • foobarqux 10 hours ago

      If what you said were actually true in a practical sense there would have been a perceptible revolution in products and services. There hasn't been.

      • chrismarlow9 5 hours ago

        I think this thread is missing that coding is a pretty small part of running a tech company. I have no concerns about my job security even if it could write all the code, which it can't.

      • IggleSniggle 10 hours ago

        I have no idea if you're correct about this or not. With 8 billion people in the world, and a significant number of those people working as "intelligent agents," how would you perceive the difference?

        • Exoristos 9 hours ago

          So your contention is, the larger the trend, the less perceptible?

          • IggleSniggle 9 hours ago

            My contention is how would you perceive the difference between a needle in a haystack and a thread-puller in a haystack

        • seoulmetro 9 hours ago

          If you think the revolution starts with 8 billion people you're just plain wrong.

          It starts with the first world and is very perceivable.

          How did we perceive cars replacing horses? Well for one they were replaced in the first world... now imagine how fast a piece of software can change reality.

          It's not there yet, and you can't perceive it because so.

          • literalAardvark 7 hours ago

            > it's not there yet

            It's literally everywhere around me.

            Coworkers, friends in other companies, business owner friends writing their first code, NGO friends using it to write grants.

            I'm not sure where you are, but you appear to be isolated from the real world.

          • IggleSniggle 9 hours ago

            When exactly did you perceive cars replacing the horse? I happen to live in a very equestrian area; I think you'd be hard pressed to convince folks that the horses have even been replaced

        • max_streese 9 hours ago

          GDP?

      • seoulmetro 9 hours ago

        Yeah. The only way this revolution doesn't happen is if humans are cheaper, easier to manage or source. And I'm pretty sure AI is already beating a human in all those categories doing the same job.

        Our jobs aren't replaced yet because they can't be.

    • beepbooptheory 7 hours ago

      I think its great we have had two years of huge enthusiasm and hype, just because in these many threads you see how much happiness it has inspired. But eventually, for most of us, it will soon become important to start getting a little more antagonistic to all this. Just really at the at end of the day to be able successfully keep navigating the world and our thoughts.

      There is an awesome power and innovation to the entire of edifice of targeted advertising. The first time, perhaps, we were all "suggested" something that was in fact quite relevant, was in its own way a giddy-inspiring moment. But we have learned to hate it, not even considering the externalities it brings.

      Just always remember: if you are paying for it, its not your friend!

  • dev0p 9 minutes ago

    A useful prompt to quickly generate this kind of website is

    "generate an index.html for {idea}".

    It's so much faster to just work within a single file. Of course, you have to be limited in scope, but for quick tools such as these it's excellent.

  • jcgrillo 8 hours ago

    I just tried the following prompt:

    > please write a rust library implementing a variant of simple8b integer compression augmented to use run-length encoding whenever it's beneficial to do so.

    Initially I was sort of impressed, it quickly generated a program which looked like rust code, and provided an explanation that, while not as technically detailed as I'd hoped, seemed to be at least related to the topic.

    Then I tried to compile the program. Turns out the bot didn't quite actually write rust, it had written something closely resembling rust though, and the compiler errors helped me fix it.

    Then I tried to run the tests--yes! the bot even wrote tests, although it did so in a totally bone-headed way by writing multiple distinct tests in one test function--not good. Panic on integer overflow trying to left shift a value. There were also multiple pages of compiler warnings complaining about dead code, unused functions, enum variants, etc. I always fail on warnings.

    This is not a lot of code. 190 lines including tests. At this point, given that I already have concerns about its correctness, I don't think there's anything I can really use here. I'm worried the deeper I dig the worse it'll get, so better to cut my losses now, sit down and read the simple8b paper, and implement this from first principles.

    Every time I try to use one of these things it's the same story. I cannot understand the hype. I'm genuinely trying but I just can't understand it.

    • zurfer 32 minutes ago

      I gave your prompt to o1-preview and with one correction it did something that seems good to me (I am not a rust programmer, so please double check). :)

      first attempt: https://onecompiler.com/rust/42w2duuqh

      final result: https://onecompiler.com/rust/42w2e3jr4

      PS: it "thought" about it for 2 x 60 seconds

    • loki-ai 7 hours ago

      It feels exactly like me programming. The first pass resembles whatever I'm trying to do, and only after some struggle with compiling errors, squiggles from the LSP and some Google fu that I get something meaningful running.

      • jcgrillo 7 hours ago

        Not unlike me! The difference is it's incremental. I write one function, then write a test, and get it working. Then, building on that stable foundation I write another function, more tests, etc. Crapping out an entire pile of garbage at once is not the way.

        I guess I'm holding it wrong? Is there a better way I could phrase my query?

        • NeutralCrane 5 hours ago

          Your are prompting it to output the entire thing at once. If you want it to approach the problem incrementally, prompt it incrementally.

          • jcgrillo 5 hours ago

            So should I be following up and asking it to refine its solution like this?

            > The program you wrote doesn't compile. Please fix it such that it compiles.

            Then, maybe, if we're lucky, we progress to the second step:

            > Ok, now the program compiles but there are tons of warnings about dead code, unreachable code, blanket trait implementations which aren't actually used, etc. Could you please fix those?

            Then assuming we clear that hurdle,

            > Great! The program compiles without warnings, but when I run the tests it panics due to an integer overflow. I see in your encode_rle function you're inexplicably left-shifting a small unsigned integer by 60, which will absolutely for certain cause it to overflow and panic. Would you mind explaining why in the actual fuck you did this and please fix it? Kthx.

            And on, and on... You know what? No. Fuck that shit. I refuse. I have absolutely no confidence this process will come up with a working, trustworthy implementation of the algorithm.

            • a1j9o94 4 hours ago

              Not, the person you were replying to, but I think a better example of incrementally here would be

              - write me a file with the function definitions for this problem. - compile that - write a test that test x outcome - compile that - then have it start writing functionality

              If it's trying to one shot a complex problem that you would typically break up, your prompt is probably too vague.

    • joeevans1000 8 hours ago

      If you drop it down a level and ask for block level code or functions you'll find it works. At this point users still have to organize the output. But I'm getting the sense that latter task is something LLMs are going to get better at.

      • jcgrillo 7 hours ago

        Was it a word choice issue on my part then? Like, this task should be achievable using two functions. Should I ask it to write the encode function and then ask it to write the corresponding decode function? Then finally in a third step ask it to write various test functions?

    • throwup238 8 hours ago

      Did you feed it the simple8b paper along with your prompt?

      • jcgrillo 7 hours ago

        No, that's an interesting idea though. Hard to imagine how that would help with the code correctness issues, though. I haven't even dug into algorithmic correctness yet so I have no real idea whether there's room for improvement there--although I sure do suspect there is!

        EDIT: Oh my. After digging into the code I found this gem:

          fn encode_rle(&self, value: u64, count: usize) -> u64 {
              let selector = Simple8bSelector::RLE as u64;
              (selector << 60) | ((count as u64) << 30) | (value & 0x3FFFFFFF)
          }
        
        And this one:

          fn try_rle(&self, input: &[u64]) -> Option<(u64, usize)> {
              if input.is_empty() {
                  return None;
              }
        
              let value = input[0];
              let mut count = 1;
        
              for &x in input.iter().skip(1) {
                  if x != value || count >= 0x3FFFFFFF {  // Max 30-bit run length
                      break;
                  }
                  count += 1;
              }
        
              Some((value, count))
          }
        
        What even is going on here? Compare to an actually sane implementation like[1] or[2].

        [1]https://github.com/lemire/FastPFor/blob/master/headers/simpl... [2]https://github.com/timescale/timescaledb/blob/403782a5899c75...

        • jcgrillo 3 hours ago

          Actually I was wrong about where the error was here, encode_rle actually works like it should the shift isn't the problem there. It actually blew up later in a different place. The second one is just a bizarre way to write that but sure, it counts the first N repeats in the input slice.. There's plenty of bizarre stuff in here[1], but mostly the general shape of the idea is directionally correct. A couple notably questionable things, though, like the assumption that RLE is always the way to go if the run length is greater than 8, perplexing style choices, etc.

          [1] https://play.rust-lang.org/?version=stable&mode=debug&editio...

    • bugglebeetle 7 hours ago

      I recommend this for a more nuanced view:

      https://nicholas.carlini.com/writing/2024/how-i-use-ai.html

      • jcgrillo 7 hours ago

        Yes, I've read that and it seems like the use cases the author can really get behind are basically "fuzzy search" queries, not implementing things. I don't think I really have those needs? My entire adult life I've cultivated a "precise search" skillset (e.g. using google and (rip)grep) that continues to serve me well--and very quickly! So I'm not seeing the value there. I've tried those use cases too and it doesn't really add up either...

    • Vaslo 8 hours ago

      The biggest acceleration is for the mediocre coders like me - the one who knows 90% of the code but will spend 95% of the time (perhaps several hours) trying to get the data structure correct. These systems can the code almost all the way there and I can now spend that couple hours running tests rather than pounding my head against the wall realizing this is faster and easier to understand in a dictionary than the dumb tuple (round peg) I would have spent hours trying to jam through the square hole.

      • jcgrillo 7 hours ago

        I think the problem you're describing might be a symptom of coding as the first step instead of the last. I find once I specify a problem and my proposed solution in sufficient detail, the structure of the code becomes obvious. This is best done with the various tools of human communication--visual diagrams, prose, mathematics, and algorithmic descriptions in the form of pseudocode. Only then, when I sufficiently understand what I'm actually trying to do, should I actually start writing code in a programming language. Otherwise I get pigeonholed into some half-baked idea by the various rigours of the language itself. Writing code before I truly understand what I'm trying to accomplish, I've learned over time, is an awfully costly form of premature optimization.

        EDIT: I don't mean to suggest programming languages aren't tools of human communication--they absolutely are. In fact, that's their primary purpose--to communicate ideas about the structure of a computation to other programmers. But starting with structural ideas about the implementation rather than conceptual ones about the nature of the problem and the shape the solution should therefore take is putting the cart before the horse.

      • literalAardvark 7 hours ago

        o1 found a new feature I hadn't noticed became available and replaced some nasty regex code I had been working on for hours with 1 library call.

        4o had been happy to attempt to help me fix my function, o1 just went "well that's interesting, meatbag, but have you considered reading the manual?"

  • jjcm 5 hours ago

    One of the other things I've been noticing is diffusion models are starting to get quite good at UI design. They're still only well-tailored for landing pages (due to most of the training data being based on portfolio sites like dribbble), but still the output is at a point where I'd at least start with some AI riffs before jumping in myself on design.

    Once these are at a point where we can automatically interpret them into usable workflows, it's going to be incredible how quickly you can develop your ideas. I'm really excited for it.

    Some examples of outputs:

    https://image.non.io/cd90cc33-4a6a-41d8-abd2-045d3a272010.we...

    https://image.non.io/5a0c3fc7-37f8-4e72-aba9-cd61f3c18517.we...

    https://image.non.io/920adf7c-a554-41bd-a29c-77bebed1cdad.we...

    • markusw 3 hours ago

      Are there any particular models you’re using for this, or are they equally good at this in your opinion?

  • lgessler 6 hours ago

    I'll add mine to the pile: I needed a visualization of the classic Towers of Hanoi puzzle for a course I'm teaching and Claude banged it out (in pure JavaScript, no less) in like 3 minutes of me typing: https://lgessler.com/files/towers.html

    The first stab was basically correct, but then I needed to prompt him several times to correct an issue where the disk that's currently picked up was being rendered as a vertical strip instead of as a disk, and then I just told him to convert the React into pure JS and IIRC it worked first try.

    • lemming 5 hours ago

      This is interesting, I also tried this with my daughter after we had been talking about Towers of Hanoi, and like you it worked really well. Then we tried to get it to implement a little version of the game where you have a farmer, some corn, a chicken and a wolf and you have to get them across the river in only one boat (actually Wiki says goat and cabbage, but whatever https://en.wikipedia.org/wiki/Wolf,_goat_and_cabbage_problem). I wasn't trying to get it to solve the puzzle, just give us a UI to play the game. We gave up after an hour or more of going in circles. I wonder if there's a lot of Towers of Hanoi implementations out there it can use as references.

  • neom 9 hours ago

    I stopped building web early 2000s to build web businesses instead, but I was pretty creative with the ol' LAMP back in the day, this whole thing is honestly just... it makes me giddy. I can build super fun stuff now without asking people, for example this took me... I dunno, from telling what I want to deploy... 15 minutes? And to ME, it's awesome: https://funds.ascent.ca/ - I doubt it's well coded or w/e, but the fact that I could get a "cool" marketing property online in < 20 minutes basically exactly as I want... giddy is the right word.

    • adriand 9 hours ago

      A friend of mine said to me a few months ago, “all existing software is technical debt”. Meaning that owning software now is, in many cases, a burden not an asset since you can spin up new greenfield software using the latest tech so quickly.

      • xhrpost 4 hours ago

        Software is more than just the code to make a computer do something. It is the auditor's log of rules that were painstakingly researched and decided on over months, years, etc. A lot of time goes into deciding what the program needs to do and only after some decision is made is it written down as code. "Do we ship the customer order after payment is authorized or after the funds settle? Depends what we're selling maybe? What does the finance department think? Oh we have regulatory rules to follow, better make sure those are there too. Oh we need to be able to back-wipe PII on-demand as well. But maybe we need to retain certain attributes, let's schedule a meeting." Etc. This is where a lot of the value lies.

      • williamcotton 9 hours ago

        Software is definitely an asset. "Becoming less valuable over time" is called depreciation, or in the case of IP, amortization. Poorly written software amortizes at a higher rate.

      • neom 9 hours ago

        That's a super interesting way to look at it. Got my wheels spinning, I guess you can in a way then say... the person who makes super easy click in persistence is very winning. For example, lets say I want to make an app that hosts all the docs the founders in my "funds" accelerator thing above. I probably want to spend a week once a year upgrading that front end with the AI tools, new coat of paint if you will, but I want it to "just work" on top of however the data it's pulling is persisted on. At that point, I really could do anything I think... at that point, people are gonna build some cool stuff I'd imagine.

    • pryelluw 9 hours ago

      Cool design. It’s well done for it being a 20 minute effort. And that’s where these things currently shine.

  • ks2048 7 hours ago

    Simon, (if you’re reading), if you “like programming”, is there any point that you get depressed about LLMs doing all the fun stuff that you wanted to do?

    I see some arguments about high-level languages, eg “I’d rather program in Python than assembly - this is just another step-up”. But I feel natural language is altogether different and obliterates any skill/knowledge you’ve built-up in programming.

    I can think other things, for example music - I like playing guitar even though I’ll never create something totally original or do better than a machine could do. But for me, programming combines the fun of creation with the satisfaction of the end result - something you wanted to exist now exists.

    To clarify, I’m not talking about “usefulness” or accomplishing some business objective, I’m talking about the joy and satisfaction of programming.

    • simonw 7 hours ago

      If anything, it's the opposite. LLMs are making programming even more fun for me, because I can choose what and when I delegate to them.

      Looking up how to accept a drag and dropped file (and then implementing it) in a JavaScript application isn't really that fun to me - certainly not for the tenth time.

      I thrive on variety and building interesting things. LLMs let me incorporate WAY more tools into my work - I can build with Go and AppleScript and Bash and ffmpeg and jq, all things I have never climbed the learning curve enough to feel confident using in the past.

      • ks2048 7 hours ago

        Yeah, that makes sense. But I wonder, going forward, what is the relationship between "building abstractions/libraries" vs "Using LLMs"?

        For your example, "How do I accept a drag and dropped file in JS?" - An LLM can spit out 100 lines that does what you want, OR someone writes a nice library that does what you want in a single function call (say, for the common case. More complicated usage requires more arguments, etc.)

        (Of course, another option is LLMs are the ones writing these library functions).

        I guess I am one of those programmers who is allergic to boilerplate (for better or worse), so having LLMs split out lots of code bothers me.

        • simonw 6 hours ago

          Part of this is my personal style: I don't like using libraries if I can throw in a tiny bit of boilerplate instead - that way I stay in full control of my code and don't need to understand dependencies that may do more than I need.

    • harisec 2 hours ago

      If you want to really get depressed about the future of software developers try aider.

  • throwup238 10 hours ago

    The new Sonnet version is pretty great at code but I keep hitting output size limitations in the Claude app when I usually didn't use to before. Anyone else experiencing "Claude's response was limited as it hit the maximum allowed length at this time" a lot more now?

    At this point their limited output limit is far behind o1/o1-mini. I really hope they significantly improve that next.

    • shubb 10 hours ago

      It's annoying but if you you just type continue it's pretty good at writing the rest of the code in a new file that you can copy paste together...

      • treme 7 hours ago

        no devs should be using the chat ui. get api key via anthropic.com, add cline extension on VSCode and do away with copy/pasting

  • swah 11 hours ago

    https://news.ycombinator.com/item?id=41904595 https://news.ycombinator.com/item?id=41913378

    Not sure why no discussion at all - maybe the design is underwhelming.

    • infoseek12 11 hours ago

      It’s extraordinary!

      Perhaps how quickly we become jaded should be taken as evidence of how quickly the world is changing right now.

      When I looked at the examples they seemed like the kind of one off scripts, of limited complexity, that we’ve seen many times in the last year or so.

    • ttepasse 5 hours ago

      Apart from chance HN seems rather time dependent – those posts seem to be posted still early in the day (I assume UTC from the timestamps) when the mostly US-American visitors doesn’t seem yet in the slacking-of mode.

    • yen223 10 hours ago

      To have discussions people need to be able to see your post, and on HN it can be a matter of luck.

    • rtzand 10 hours ago

      Perhaps people want to read other blogs once in a while. We are at a stage of AI glut. If people pump out so much content in such a short time, no one can read it all (or is interested to read it).

    • mvdtnz 11 hours ago

      Maybe because we're all exhausted by these kinds of demos. I'll be interested in AI when I can integrate it with a large codebase and it provides any benefit. I'm happy for you that you can stand up little toy apps that pull down an NPM library and call it, but it's not useful in my professional life.

      • mattnewton 11 hours ago

        If you have one of those larger codebases you aren't afraid of Cursor Co. getting a copy of, I would really try out Cursor. The indexing of medium sized mono-repo I've been working in is pretty flawless (contains code for a static web page, a singe page web app, and some python services along with deployment configs)

        • sigh_again 10 hours ago

          That's not a medium sized repo, that is a baby you can entirely memorize in your own head. Cursor is also dreadful at anything that isn't Javascript and Python.

        • stonethrowaway 11 hours ago

          That doesn’t sound like a medium sized repo?

        • mvdtnz 10 hours ago

          There's absolutely no way I'd share my codebase with some fly-by-nighter ~crypto~ AI company, but the codebase I work on is upwards of 60 million lines of code so I doubt any "AI" solution would come close to being useful.

          • ben_w 10 hours ago

            At 60 Mloc? Sure, for now I'd agree. An AI needs to use other tools to handle that kind of size, it doesn't (practically) work so well if you try to hold the whole thing in context.

            And while tool use is being worked on, the results I've seen are are the "that's an interesting tech demo" level rather than the mind-blowing change when InstructGPT demonstrated the ability for a language model to generate any meaningful code at all from natural language instruction.

          • riku_iki 10 hours ago

            > I work on is upwards of 60 million lines of code so I doubt any "AI" solution would come close to being useful.

            that's what rag is supposed to solve: they chunk your 60M loc, and then retrieve and process only relevant depending on your inquery.

      • bloopernova 11 hours ago

        Exactly. When the LLM can create valid tests that cover all branches, then it will be useful to me.

        I use copilot every day, but it's only so good.

        The LLM hype feels like it's been driven by FOMO.

        • jkaptur 11 hours ago

          I had a pretty cool experience with that the other day. I wrote some production code (LLM had no idea what was going on), then I measured the coverage and determined a test case that would increase it (again, just using my brain), BUT when I typed "testEmptyString" or whatever, the LLM filled in the rest of the test. Not a massive change to the way I work, but it certainly saved me a bunch of time.

          • NitpickLawyer 10 hours ago

            I swear half the people in this thread have spent 5 minutes with the first chatgpt, pre 3.5, wrote it off and are so convinced of their superiority that they won't spend the time required to even see where it's at.

            Ever saw someone really bad at googling? It's the exact same thing with LLMs (for now). They're not magic crystal balls, and they certainly can't read everyone's minds at the same time. But give them a bunch of context, and they'll surprise you.

            • IggleSniggle 9 hours ago

              Sshhh, we've still got like a year of advantage over the folks that haven't learned about searching the Internet and still have to drive to their local university library...don't squander it!

            • NeutralCrane 5 hours ago

              Engineers are also notorious for having hit and miss soft skills. The interface for LLMs is natural language. I wouldn’t be surprised if much of the variance in usefulness boils down to how effectively you can communicate what you want it to do.

        • unshavedyak 11 hours ago

          I dunno, i think it's really cool still and i barely use it because i find it more work to use than to not.

          • written-beyond 10 hours ago

            Same. I've felt any LLM for coding has saved me mechanical time, but as of now anything slightly more complex than that just makes me waste more time figuring stuff out.

            Other than the automation aspect, it is a pretty good alternative to in-depth googling.

      • williamcotton 10 hours ago

        My duties as a data scientist and forensic investigator involve writing lots of "little toy apps" for ETL and analysis.

        BTW, why the disparaging reference to "little toy apps"?

        • sigh_again 10 hours ago

          >BTW, why the disparaging reference to "little toy apps"?

          It's an unmaintainable, single use piece of software (that doesn't even implement the features, it just glues together already existing code) that any CS student could write in a week. Congrats on getting a really fast CS student I guess ? Not to mention the fact that perfectly viable, better alternatives are available in many places.

          It's like me nailing two 2x4s together to make a shelf. Yeah, sure, I made it myself and I didn't need any woodworking knowledge, but let's just hope I don't put grandma's heavy china on it.

          • IggleSniggle 10 hours ago

            As a professional 2x4 nailer and gluer, I assure you that I have a ton more deliverables for my client. The downside is that now I actually have to put some thought into my work; you know, put some engineering work into it.

            The upside is that I can produce a shit-ton of one-shot code in record time, so I've got time to face the downside.

          • williamcotton 9 hours ago

            I had a Claude Project write a number of CLI tools that interact, eg

              search_documents "search term" --bm25 --table-prefix some_project | fulltext -hl -v | less
            
              search_documents "search term" --bm25 --table-prefix some_project | metadata
            
            It inserts documents by piping paths into another script, eg,

              find /some/path/*.pdf | insert_documents --table-prefix some_project
            
            The documents end up in a Postgres database with pg_search bm25, tsvector, and semantic embeddings (from a local model).

            I would estimate that I only wrote 5% of the code in the project with the rest coming from the LLM.

            Sure, it's just a few hundred lines of code but it's been stable and helpful to get through some very large tranches of discovery material.

          • simonw 8 hours ago

            I think being able to build something in less than three minutes that would take a CS student a week is pretty worthwhile, personally.

        • skydhash 10 hours ago

          It's related to this: https://xkcd.com/1205/

          As a programmer (which is the requisite to build such tools even with LLMs), I have a plethora of tools to do the tasks, what I choose and how much time I invested in in that depends on something similar to this chart, but with an added dimension: interest.

          Take for example the URL extraction. For one single occasion, I'd probably use VIM and macros to quickly do it. If it were many pages, I'd write a script. If it were infrequent, but recurrent, I'd take the time to write a better script and would only write a web page if the use case was shared with other people or if I wanted a cross platform solution.

          I believe the first question one should ask before building is why. That leads you to find a better UX than shoehorning everything inside a web app.

          • IggleSniggle 9 hours ago

            I 100% agree with the point you are making. The only aspect that obscures it is that paying employers will happily pay to support interests that are, on inspection, a waste of time.

            In that aspect, I am hopeful. Maybe if "waste of time" activities are commoditized, "professionals" can instead focus on "what is important," whatever that might be.

        • mvdtnz 10 hours ago

          How would you refer to the example apps in the OP's link? They are almost definitionally toy apps, and definitely little (a handful of pages of code including all of the HTML).

  • simonw 7 hours ago

    There are a bunch of comments in this thread along the lines of "these are just toys" and "anyone could build these without an LLM".

    I need to update my post to emphasize this, but that's kind of the point.

    Every one of these 14 tools (with the possible exception of the OpenAI Audio debugger one, that one's quite hard) is something any web programmer could build relatively quickly.

    ... but not as quickly as I did with an LLM, because they almost all took less than 5 minutes from idea to finished implementation.

    The key point is that if I didn't have Claude to help build these, I wouldn't have built them at all. None of them would justify even an hour of work - they weren't essential tools that I needed to get stuff done, they were just things I built because building them is now so cheap (in terms of time) that there was no reason not to.

    That's the real magic here. The cost of knocking out a single page app that does something simple is often now lower than even the cost of spending a few minutes on Google trying to find an existing tool that solves the same problem.

    • harisec 2 hours ago

      These are toys but in 2 years they will probably be full projects and 2 years later people will ask "why do i need a software developer?"

      • simonw 2 hours ago

        I just don't think that's true.

        If all someone does is write code based on specifications handed over by someone else then yes, they have cause to be worried - but in my career as a software engineer the "typing code into a computer" bit has only ever been 10-20% of the work that I do.

        The big challenge of software development has always been turning human needs into working software. That requires a great depth of experience in terms of what's possible, what isn't possible, how software works and how to architect and design software to deliver value today while still staying flexible for future development.

        LLMs can accelerate that process a bit, but I don't think they can replace it. Someone still has to drive the LLMs. I think people with software development skills are best placed to do that.

        • harisec 2 hours ago

          That's a good point and I agree with you. However, would you agree that in a few years we will need far less developers than we need right now?

          • simonw 2 hours ago

            I had a podcast conversation about this recently: https://newsletter.pragmaticengineer.com/p/ai-tools-for-soft...

            I think LLMs mean developers can build stuff faster, which reduces the cost of developing software.

            My optimistic scenario is that this expands the market for custom software, a lot. Companies that would never have considered developing their own software - because they'd need six developers working for twelve months - can now afford to do so, because they need two developers for three months instead.

            The result is more jobs for engineers, and engineers become more valuable because they'd can get more done.

            I'm not an economist so I won't pretend I'm confident this will happen, but it's my optimistic scenario.

    • sogen 6 hours ago

      Yes, exactly my thoughts. I needed a RSS converter from JSON to XML, and was able to quickly made one.

      I'm not a programmer.

      It's life-changing.

  • fHr 9 hours ago

    if this aren't amazing times to be alive I don't know, this is insane, I also started to learn some rust on the weekend and it is nuts how good a chatgpt 4 can be as a teacher to support you on the fly

    • bongodongobob 9 hours ago

      No you don't understand. My programming is so advanced that LLMs cower and shut themselves down when I try to use them. I am a senior developer! Clearly you don't know what you're doing and all the code it writes is bad and you'll never be able to maintain it. LLMs can't invent cutting edge, never before seen code that I do everyday because the problems I solve are so advanced even god himself can't understand my codebase. /s

  • harry8 9 hours ago

    Anyone got a "best practises" or even a few "my workflow" blog posts on how to best use LLM's with a local code base?

    Just saw someone recommend:

    https://aider.chat/

    Thought there might be more worth exploring from this community. ;-)

  • nichochar 10 hours ago

    We built an open-source and local tool that allows you to take these even further. Highly recommend plugging in the latest model, but you can keep iterating on the apps.

    Currently also on the front page https://news.ycombinator.com/item?id=41926067

  • xster 10 hours ago

    Anthropic is so close to getting to a WeChat-esque store-less super-app state. It just needs a way to gather all your published artifacts and surface them easily in the sidebar like your favorited chats.

    Since Elon is so interested in that model, if xAI had Claude's capabilities, they would surely go with that angle

  • yapyap 9 hours ago

    wish there was an option to hide all chat AI related topics on HN

    • tomrod 7 hours ago

      HN has always had a focus on what will change the world 5 years out.

      Chat-based AI is a remarkable set of functionalities to build with. It isn't the only improvement in Tech, let alone AI/ML, but it is massive.

      I quite enjoy learning more about it from dedicated folks like simonw.

    • thimabi 9 hours ago

      Maybe it’s time to create a new HN client with this feature. Ironically, an AI can be of assistance in filtering AI-related submissions or comments.

    • GaggiX 8 hours ago

      Ask Claude to write the browser extension for you so you can stop whining.

  • ainiriand 3 hours ago

    Hey some time ago I needed this at work:

    https://www.jsoncomments.com/

    it is basically a tool to add some additional text to json text files and interpret it as comments for each line. I did it with ChatGPT.

  • rtpg 7 hours ago

    I am a bit frustrated that I don't have a great "tool" environment to build out this stuff, because of having to futz with the I/O. Like most of that stuff is "well I know how to write the Python to do the last step, but wrapping it all up in a simple web UI is Too Much Work". That effort might be small, but it's still orders of magnitude larger than the snippet!

    Lots of TUI interfaces try to approximate this, but I think I really just need to build out something a bit like https://anvil.works/

    • psadri 7 hours ago

      You should try SkyMass. You can put up a web ui in as few as 10 loc. and something actually useful in around 50.

      Contact me if you need some help getting going…

  • Eliezer 5 hours ago

    Meanwhile, no luck getting it to build something that reverses a GIF. (Also, weirdly, no luck with finding a working GIF reverser online.) (Trying to reverse this: https://www.tumblr.com/necessary-disorder/765064008182235136.)

    • synthoidzeta 5 hours ago

      Try running this from the CLI (you'd need to install gifsicle first):

      gifsicle --unoptimize input.gif '#-1-0' > reversed.gif

    • SamDc73 5 hours ago

      These things give me a weird fuzzy feeling when looking at !

    • fragmede 4 hours ago

      ask it to install ffmpeg and have it use that to reverse the gif.

  • rahimnathwani 7 hours ago

    Here's one I built with Claude last week: https://news.ycombinator.com/item?id=41855594

  • sourcecodeplz 11 hours ago

    It's just not a big deal.

  • freediver 9 hours ago

    A very interested paradigm introduced here by Anthropic is that this content is hosted. And the output of LLM is amde a self-hosted app, ready for consumption by consumer. Not far away from build my own site kind of thing.

  • StickyRibbs 10 hours ago

    i'll start panicking when it can productionalize an app and deploy it to GCP without any errors.

    • shishy 10 hours ago

      I used cursor to manage spinning up and deploying a full stack app in AWS last week. Took me one afternoon.

    • raincole 9 hours ago

      Then you should have been panicking for several months already.

    • ndndjdueej 8 hours ago

      Invert that.

      AI makes docker compose app. Cloud providers that cannot deploy a docker compose app simply and without errors will miss out.

    • trhway 10 hours ago

      I think you've just gave an idea to somebody's next startup, and we'll probably see it is being done in half-a-year. In general all that tedious YAML/etc. is ripe for the "autocompletion AI".

      • CptFribble 9 hours ago

        until it hallucinates a config and rings up a $10,000 AWS while you're asleep

      • skydhash 10 hours ago

        And then you'll find out a node was deployed with no backup strategy while there are multiple useless ones burning money.

      • sigh_again 10 hours ago

        "Don't look at your Kubernetes configuration, trust our AI to do it well" sounds like a psyop straight out of GCP or AWS to charge you four times what they need to before telling you "no, you absolutely need that $500 charge for your 1RPS static website, yes yes absolutely."

        • trhway 10 hours ago

          And for auditing your config and for reviewing your cloud provider's [autogenerated by AI] offers and suggestions will be another AI which will also be able to chat with their customer support AI.

  • djoldman 10 hours ago

    @simonw: jina is getting cranky:

    https://tools.simonwillison.net/jina-reader?

    {"data":null,"code":451,"name":"SecurityCompromiseError","status":45102,"message":"Your request is categorized as abuse. Please don't abuse our service. If you are sure you are not abusing, please authenticate yourself with an API key.","readableMessage":"SecurityCompromiseError: Your request is categorized as abuse. Please don't abuse our service. If you are sure you are not abusing, please authenticate yourself with an API key."}

  • tomcam 2 hours ago

    I love love love that the transcripts are included.

  • ToJans 10 hours ago

    I fully agree.

    I think Claude offers me 10x productivity, especially for all these helper apps and technical POCs that I typically create during the week.

    And that's without even mentioning mail chain replies, analysis of legal or financial documents, helping my kids with their math assignments,...

    It's a huge enabler for me, and it's getting better every month.

    We are getting up the abstraction ladder faster and faster, and I cannot even imagine where we will end up within a few months, or a few years.

  • pluc 8 hours ago

    and you didn't learn a goddamn thing

  • purple-leafy 2 hours ago

    Wild that people lap this up. Absolutely wild

  • bilsbie 7 hours ago

    What’s the best and easiest way to host and share these artifacts? GitHub html?

  • joeevans1000 8 hours ago

    These LLMs create significantly useful code and they are getting better. Those professionals who deride their utility are putting themselves at risk of not strategizing well for their future.

  • corytheboyd 10 hours ago

    Just in case you need it: https://github.com/gchq/CyberChef

    I was just trying to be helpful, since it was relevant to content in the post…

    • burgerquizz 10 hours ago

      if i want to just paste an url, of full page of HN, and extract all comments in a json format. would that tool work?

      • sigh_again 10 hours ago

        [...document.querySelectorAll(".commtext").values().map((it) => it.innerText)]

        Works in every single web browser. No calls to OpenAI needed, and I'm rusty on Javascript. Make it a bookmarklet, and you don't even need to run a dedicated webpage on your machine for that.

      • corytheboyd 10 hours ago

        You already know the answer to that. This is just a very helpful interface for arbitrary data conversions, that I thought passers by might like to know about. It’s not an LLM, but it does what some of the examples in the article does, and more.

  • almog 8 hours ago

    For the majority of these, a simple google search would have lead to an existing program/website that does the same thing.

    We're past the POC stage. LLMs can generate code for simple programs. It's when you try to tweak the requirements and point how a program introduces a bug that you eventually realize they still fail to take you through the last mile just as they did year and a half ago.

    • simonw 8 hours ago

      "For the majority of these, a simple google search would have lead to an existing program/website that does the same thing."

      That's what's so wild about this: that's true, and yet in most of those cases it's still faster and more productive for me to ask Claude to build me a brand new tool _from scratch_ than it is for me to try and find an existing one via Google.

      The problem with trying to Google for these kinds of things is that you have to evaluate the results that come back and figure out which one of them correctly solves your problem. That's a few extra steps.

      It's genuinely faster to prompt something like this instead:

      > Build an artifact (no react) where I can paste text into a textarea and it will return that text with all HTML entities - single and double quotes and less than greater than ampersand - correctly escaped. The output should be in a textarea accompanied by a "Copy to clipboard" button which changes text to "Copied!" for 1.5s after you click it. Make it mobile friendly

      Done: https://claude.site/artifacts/46897436-e06e-4ccc-b8f4-3df90c...

      In this case I knew exactly what I wanted: it had to do less than, greater than, ampersand, double quotes AND single quotes. I know from past experience that many tools like this forget about single quotes, so I'd have to evaluate any tools I found to check that they do that. And I was on my phone so I wanted a "copy to clipboard" button.

  • sfmike 2 hours ago

    BuT nO 0Ne iS uSINg Ai fOR pRoDuCtion COde

  • skydhash 10 hours ago

    I don't know if I'd take less time, but I would definitely type less.

  • bravura 8 hours ago

    Is the realtime audio API now open to everyone?

  • foobarqux 11 hours ago

    I just don't seem to find this stuff as useful to me as people are portraying. Take the "extract URLs" example: I would just do

         curl -sL $URL | htmlq 'a' -a href
    • simonw 8 hours ago

      Harder to run that on a phone. I built most of these little web apps so I could use them from Mobile Safari.

      Also it doesn't look like htmlq can handle pages that render their content with JavaScript. If you want to do that you might find my shot-scraper CLI utility useful: https://shot-scraper.datasette.io/en/stable/javascript.html

          shot-scraper javascript https://simonwillison.net/ 'Array.from(document.links).map(a => a.href)'
      • foobarqux 7 hours ago

        I can point to similar problems even in CLI-target apps. In Nicholas Carlini's post for example he shows how LLMs helped him make curl parallel by piping to the "parallel" utility. That works but no sane person would do it given that curl has built-in parallel processing via the "-Z" flag which you could have found in 10 seconds by opening the man page. I'm sure this was an instance of a developer (truly) believing they became 10x more productive.

        These aren't even the "hard" problems that are beyond the reach of LLMs today; they seem like things they should be able to do. It's just that, today, they just aren't achieving the spectacular results that many are claiming; it's mostly pretty crappy.

        shot-scraper looks nice.

    • pnut 10 hours ago

      That's pretty happy path for one, and for two, how exactly are you doing it? Not by holding down a red button on your phone and talking into it, that's for sure.

      For three, add one more subtle requirement to the task, and now you're reading awk manpages and trial-and-erroring perl oneliners.

    • qingcharles 9 hours ago

      I get your point, but that doesn't exactly replicate the original tool which lets you just paste a chunk of rich text in.

      And I would have had to use GPT to give me the syntax for that command line anyway :)

    • cpursley 10 hours ago

      Yeah, sure - if you have the memory that allows that sort of recall. For the rest of us, LLMs are like Alzheimer’s medication or eye glasses. Believe it or not, these types of esoteric commands are very difficult for some of us to remember - but AI is amazing at this sort thing (Unix commands, etc as well as trouble shooting them).

      • sureglymop 10 hours ago

        I mean they may also make your memory worse if you always go straight for the llm instead of trying to remember.

        • ben_w 10 hours ago

          I can't remember, was it Aristotle or Plato who said that about writing?

          • svieira 8 hours ago

            It was Socrates - and he was correct. When was the last time you met someone who could recite The Iliad from memory?

            But more to the point ... in Phaedrus he's not talking about "who will memorize the Iliad now that we have the written word", he's talking about "can the written word _teach_". And the answer (as always) is "no and yes".

            > and now you, who are the father of letters, have been led by your affection to ascribe to them a power the opposite of that which they really possess. For this invention will produce forgetfulness in the minds of those who learn to use it, because they will not practice their memory. Their trust in writing, produced by external characters which are no part of themselves, will discourage the use of their own memory within them. You have invented an elixir not of memory, but of reminding; and you offer your pupils the appearance of wisdom, not true wisdom, for they will read many things without instruction and will therefore seem [275b] to know many things, when they are for the most part ignorant and hard to get along with, since they are not wise, but only appear wise.

            https://www.perseus.tufts.edu/hopper/text?doc=Perseus%3Atext... and https://www.gutenberg.org/files/1636/1636-h/1636-h.htm#link2....

            • IncreasePosts 8 hours ago

              Tons of people can recite the Bible and the Koran from memory.

          • IggleSniggle 9 hours ago

            I'm pretty sure it was Socrates but I might be hallucinating

          • wrtasfg 8 hours ago

            But where do chatbots fit into this? Cribbing from your neighbor in an exam? Watching TV?

      • foobarqux 10 hours ago

        That might be a reasonable argument if the LLM suggested something similar to the command I posted instead of an incredibly complicated webapp.

        As is it just spits out migraine-inducing "it-works-doesn't-it" solutions from someone starting to learn to program.

      • skydhash 10 hours ago

        > Yeah, sure - if you have the memory that allows that sort of recall.

        You don't memorize them. You learn the foundational knowledge (in this case how http works and the html format, and a bit of shell scripting), then read the manuals and compose the commands. And as days pass, you save interesting snippets somewhere. Then it becomes easier each time you interact with the tools.

        Anyone would find ffmpeg or imagemagick daunting if they don't know anything about audio or graphics.

  • kevinmerritt 9 hours ago

    I learn so much from you. Thank you.

  • v3ss0n 10 hours ago

    None of them worth writing home about

  • thimabi 10 hours ago

    I take tools like these as an inspiration. All of us have at least some trivial tasks that can be automated. In the past, automating them might have been a hassle, but with LLMs, that’s no longer the case. I, for one, have a “scripts” folder with dozens of one-off mini-apps to handle specific tasks, and this folder keeps growing every day.