Everything I built with Claude Artifacts this week

(simonwillison.net)

637 points | by recvonline 9 months ago ago

483 comments

rty32 9 months ago

I'm sure there are plenty of examples like this, but one thing that I find really hard to deal with is to integrate such tools into existing codebase -- you can make all these things as standalone pages, but for a professional developer, you have certain standards and conventions, and often it takes a lot of work to review/revise the code to make it work with existing codebase, so much that you end up using inline completion just to help with obvious stuff or boilerplate. I woule rather spend 20% extra amount of time to write the code myself yet have confidence, than spend time tweaking the prompt or giving follow up instructions.

[-]

Buttons840 9 months ago

With a sufficiently advanced type system, you could lay out high level building blocks, such as function definitions, and let the LLM go wild, and then if things compile there's a high chance it's correct. Or maybe even a formal proof things are correct.

I was blown away when I realized some Haskell functions have only one possible definition, for example. I think most people haven't worked with type systems like this, and there are type systems far more powerful than Haskell's, such as dependant types.

There's not much reason to worry about low level quality standards so long as you know it's correct from a high level. I don't think we've seen what a deep integration between a LLM and a programming language can do, where the type system helps validate the LLM output, and the LLM has a type checker integrated into its training process.

[-]

oblio 9 months ago

> With a sufficiently advanced type system

Is this a brother or a cousin of the "sufficiently advanced compiler"? :-)

[-]

zahlman 9 months ago

A component, if I correctly understand the proponents of such type systems.

MichaelBurge 9 months ago

Anything on top of the Calculus of Constructions is usually enough. So it's not a moving target, and there are multiple implementations.

inopinatus 9 months ago

if there is only one valid translation of the type constraints into executable code then what you have is a slow, expensive, and occasionally argumentative compiler

it merely remains to build a debugger for your Turing-complete type system, and the toolchain will be ready for production

Buttons840 9 months ago

> Is this a brother or a cousin of the "sufficiently advanced compiler"? :-)

I believe the claim was that a sufficiently advanced compiler could do a lot of optimization and greatly improve performance. Maybe my claim here will turn out the same.

tessierashpool9 9 months ago

it certainly is a sister or niece of the "sufficiently advanced technology".

fhdsgbbcaA 9 months ago

Things I never want to hear about flight control systems before I board a plane: “if things compile there's a high chance it's correct”

[-]

literalAardvark 9 months ago

Very odd comment, since that's exactly what you do want to hear

[-]

gloflo 9 months ago

I'd rather hear: "The compiled code has gone through all tests in the comprehensive, human-expert-written, standardized test suite correctly"

Compiling does not differentiate between True and False, so no safety for that escape pod door.

[-]

literalAardvark 9 months ago

I took that as part of the build process.

But I definitely want as much as possible to be automated and formally correct, which is why I wrote what I wrote.

xmprt 9 months ago

There's a lot of code that compiles but isn't correct.

[-]

literalAardvark 9 months ago

Because we're using languages with flexibility but no correctness, but the vast performance advantage AI programming has over us could be used to manage the formal proofs for a verified toolchain.

We're not quite there yet, but while regular programming is quite tough for AI due to how fuzzy it is, formal proofs are something AI is already very good at.

ziggyzecat 9 months ago

if-then-high-chance logic is for sex education & prototypes, not for airplanes carrying insured passengers

skybrian 9 months ago

The functions for which there's only one implementation are trivial examples. It's not going to work for anything even slightly more complicated, like a function that returns a float.

Even if you could, you probably wouldn't want to make any change a breaking change by exposing implementation details.

rq1 9 months ago

I did just that actually to:

* build a codegen for Idris2 and a rust RT (a parallel stack "typed" VM)

* a full application in Elm, while asking it to borrow from DT to have it "correct-by-construction", use zippers for some data structures… etc. And it worked!

* Whilst at it, I built Elm but in Idris2, while improving on the rendering part (this is WIP)

* data collators and iterators to handle some ML trainings with pausing features so that I can just Ctrl-C and continue if needed/possible/makes sense.

* etc.

At the end I had to rewrite completely some parts, but I would say 90% of the boring work was correctly done and I only had to focus on the interesting bits.

However it didn’t deliver the kind of thorough prep work a painter would do before painting a house when asked for. It simply did exactly what I asked, meaning, it did the paint and no more.

(Using 4o and o1-preview)

[-]

tharant 9 months ago

I keep seeing folks who say they’ve built a “full application” using or deeply collaborating with an LLM but, aside from code that is only purported to be LLM-generated, I’ve yet to see any evidence that I can consider non-trivial. Show me the chat sessions that produced these “full applications”.

[-]

simonw 9 months ago

How about this one? https://simonwillison.net/2024/Mar/30/ocr-pdfs-images/

That's for this: https://tools.simonwillison.net/ocr

[-]

tharant 9 months ago

An application powered by a single source file comprised of only 400 lines of code is, by my definition, trivial. My needs are more complex than that, and I’d expect that the vast majority of folks who are trying to build or maintain production quality revenue generating code have the same or similar needs.

Again, please don’t be offended; what you’re doing is great and I dearly appreciate you sharing your experience! Just be aware that the stuff you’re demonstrating isn’t (hasn’t been, for me at least) capable of producing the kind of complexity I need while using the languages and tooling required in my environment. In other words, while everything of yours I’ve seen has intellectual and perhaps even monetary value, that doesn’t mean your examples or strategies work for all use-cases.

[-]

simonw 9 months ago

LLMs are restricted by their output limits. Most LLMs can output 4096 tokens - the new Claude 3.5 Sonnet release this week brings that up to 8196.

As such, for one-shot apps like these there's a strict limit to how much you can get done purely though prompting in a single session.

I work on plenty of larger projects with lots of LLM assistance, but for those I'm using LLMs to write individual functions or classes or templates - not for larger chunks of functionality.

[-]

tharant 9 months ago

> As such, for one-shot apps like these there's a strict limit to how much you can get done purely though prompting in a single session.

That’s an important detail that is (intentionally?) overlooked by the marketing of these tools. With a human collaborator, I don’t have to worry much about keeping collab sessions short—and humans are dramatically better at remembering the context of our previous sessions.

> I work on plenty of larger projects with lots of LLM assistance, but for those I'm using LLMs to write individual functions or classes or templates - not for larger chunks of functionality.

Good to know. For the larger projects where you use the models as an assistant only, do the models “know” about the rest of the project’s code/design through some sort of RAG or do you just ask a model to write a given function and then manually (or through continued prompting in a given session) modify the resulting code to fit correctly within the project?

[-]

simonw 9 months ago

There are systems that can do RAG for you - GitHub Copilot and Cursor for example - but I mostly just paste exactly what I want the model to know into a prompt.

In my experience most of effective LLM usage comes down to carefully designing the contents of the context.

[-]

tharant 9 months ago

> There are systems that can do RAG for you

My experience with Copilot (which is admittedly a few months outdated; I never tried Cursor but will soon) shows that it’s really good at inline completion and producing boilerplate for me but pretty bad at understanding or even recognizing the existence of scaffolding and business logic already present in my projects.

> but I mostly just paste exactly what I want the model to know into a prompt.

Does this include the work you do on your larger projects? Do those larger projects fit entirely within the context window? If not, without RAG, how do you effectively prompt a model to recognize or know about the relevant context of larger projects?

For example, say I have a class file that includes dozens of imports from other parts of the project. If I ask the model to add a method that should rely upon other components of the project, how does the model know what’s important without RAG? Do I just enumerate every possible relevant import and include a summary of their purpose? That seems excessively burdensome given the purported capabilities of these models. It also seems unlikely to result in reasonable code unless I explicitly document each callable method’s signature and purpose.

For what it’s worth, I know I’ve been pretty skeptical during our conversations but I really appreciate your feedback and the work you’ve been doing; it’s helping me recognize both the limitations of my own knowledge and the limitations of what I should reasonably expect from the models. Thank you, again.

[-]

simonw 9 months ago

Yes, I paste stuff in from larger projects all the time.

I'm very selective about what I give them. For example, if I'm working on a Django project I'll paste in just the Django ORM models for the part of the codebase I'm working on - that's enough for it to spit out forms and views and templates, it doesn't need to know about other parts of the codebase.

Another trick I sometimes use is Claude Projects, which allow you to paste up to 200,000 tokens into persistent context for a model. That's enough to fit a LOT of code, so I've occasionally dumped my entire codebase (using my https://github.com/simonw/files-to-prompt/ tool) in there, or selected pieces that are important like the model and URL definitions.

anonzzzies 9 months ago

any code to see? I like Idris so this would be interesting.

gabigrin 8 months ago

That's one take I'm trying to take with https://www.flyde.dev. Build that higher-level abstraction that allows LLM to shine and us humans to enjoy a more effective way to view and control its outcome.

I know it's not what you meant, but tbh I don't think going deeper than Haskell on type-system power is the route to achieve mass adoption.

runeks 9 months ago

> With a sufficiently advanced type system, you could lay out high level building blocks, such as function definitions, and let the LLM go wild, and then if things compile there's a high chance it's correct. Or maybe even a formal proof things are correct.

A formal proof is simply a type that allows only the correct implementation at the term level. And specifying this takes ages, as in hundreds of times longer than just writing it out. So you want to use an LLM to save this 1% of time after you've done all the hard work of specifying what the only correct implementation is?

halfmatthalfcat 9 months ago

I'm not sure how much things have changed but I tried to use GPT-4 when it first came out to build a library on top of Scala + Shapeless and it utterly failed. Unless you can somehow wire in an LSP per language as an agent and have it work through the type errors as it tries to create code, I can't see where we'll ever get to a place where LLMs can work with strongly typed languages and produce compliant code.

Even with, the aforementioned "have an LSP agent work through type errors", it may be faster to just do it yourself than wait for an LLM to spit out what may be correct.

[-]

nycdatasci 9 months ago

Definitely try Claude 3.5 Sonnet and o1-preview. They have succeeded for me where other models have failed. Also, use Cursor IDE.

[-]

mlhpdx 9 months ago

Oh, I’ve found Claude 3.5 to be better but still pointless. To be specific, it generates code that does what I ask (roughly) but…

- Has obvious bugs, many at runtime - Has subtle bugs. - Is inefficient.

All of which it will generally fix when asked. But how useful is it if I need to know all the problems with its code beforehand? Then it responds with the same-ish wrong answer the next time.

Still a long way to go IMO.

csomar 9 months ago

Claude 3.5 is good, however, it's "Type comprehension" is really basic. That was my experience with it when using Rust. It still can't create an internal mental model of the types and how to link them together. It'll also, at some points, start to heavily hallucinate functions and stuff.

cft 9 months ago

o1-mini is even higher in coding benchmarks than o1-preview. That has been my experience also.

tharant 9 months ago

Glad to know I’m not the only one who has had difficulties getting reasonable Scala code from an LLM. While I agree that LSP integration would likely help with generating or debugging production-quality code, I’d also suggest that the need for an LSP demonstrates these models’ inability to reason—at least about code. Perhaps they were trained more deeply and carefully on the marketing or “getting started” pages for the libraries and languages I need to use and not as much on examples of production-quality code repositories that make use of those languages/libs/tools.

rapind 9 months ago

Oh this is an interesting take. Haskell, F#, possibly Elm, etc.

szundi 9 months ago

You know people work with what they work with.

TechDebtDevin 9 months ago

I'm sort of playing around with something like this in go for fun.

fny 9 months ago

I find it helps if you treat the code generated as a third-party package with a well defined API. Then your role becomes gluing things together.

It’s an approach similar to how I’ve dealt with junior devs in the past. You specify an interface for a class, provide examples as a spec, and you get what you want without colliding with the main project.

For sanity’s sake, I keep these AI generated modules in single files just so it’s an easy copy and paste into ChatGPT.

[-]

tharant 9 months ago

> It’s an approach similar to how I’ve dealt with junior devs in the past.

Your experience with and approach to juniors is different than my own. I don’t ask juniors to write a single class file; I give them a design document, an API spec, and documentation for standard practices, then I work as closely with them as needed to get the results I need and the experience they need. This approach works well for me and the vast majority of juniors with whom I’ve worked because we can pretty quickly identify gaps in their knowledge so we can then provide experience and education that benefits both parties. The same approach has failed miserably for me when pairing with an LLM for anything other than trivial code-generation tasks. The models don’t learn (sure, some providers offer “memory” but the limits of those features are pretty obvious once you try to use ‘em in practice for anything other than “don’t forget that I like tacos, the color blue, and sci-fi”)

> For sanity’s sake, I keep these AI generated modules in single files just so it’s an easy copy and paste into ChatGPT.

That’s not acceptable for production-quality code—at least not in my environment.

[-]

fny 9 months ago

I ask juniors to do just as you do. There's a contract specified in the API spec and relevant documents. In kind, I build up an "AI" package one module at a time by specifying contracts for each module. The principle remains the same but at a smaller scale because, as you say, LLMs don't learn.

I know the organizational style doesn't fit a typical "production" set up, but the reality is the code produced is very good. I only set it up this way so I guarantee I can continue iterating on a module without too much pain.

Also, who cares if I have a way more files if I'm still building features for my customers?

cube2222 9 months ago

With something like the AI assistant in Zed you’d generally provide a few files the assistant can use as a reference. I’ve had good luck in having it follow the codebase’s style and standards this way.

[-]

benwilber0 9 months ago

+1 to the completions/inferences in Zed. It's the first editor that I feel (mostly) confident about just tabbing-through the AI completions with minimal prompting/re-editing.

salviati 9 months ago

Have you tried https://aider.chat ?

[-]

vbezhenar 9 months ago

I tried it yesterday and wasn't successful. I spent like 30 minutes trying to explain to it to make a simple change. Every time it made this change and several others as well which I didn't ask for. I asked to undo those several other changes and it undoes everything or does other unrelated things.

It works good until it doesn't.

It's definitely a useful tool and I'll continue to learn to use it. However it is absolutely stupid at times. I feel there's very high bar to use it, much higher than traditional IDEs.

[-]

ripley12 9 months ago

Which LLM were you using? I’ve had a great experience with Aider and Claude Sonnet 3.5 (which is not coincidentally at the top of the Aider leaderboard).

[-]

esperent 9 months ago

I've been using Claude dev VSCode extension (which just got renamed but I forget the new name), I think it's similar to Aider except that it works via a gui.

I do find it very useful, but I agree that one of the main issues is preventing it from making unnecessary changes. For example, this morning I asked it to help me fix a single specific type error, and it did so (on the third attempt, but to be fair it was a tricky error). However, it persistently deleted all of the comments, including the standard licensing info and explanation at the top of the file, even when I end my instructions with "DO NOT DELETE MY COMMENTS!!".

[-]

kbaker 9 months ago

You may want to peek at the system prompts Aider uses. I think this is part of the secret sauce that makes it so good.

https://github.com/Aider-AI/aider/blob/main/aider/coders/edi...

excerpt: """ Act as an expert software developer. Always use best practices when coding. Respect and use existing conventions, libraries, etc that are already present in the code base. {lazy_prompt} Take requests for changes to the supplied code. If the request is ambiguous, ask questions.

Always reply to the user in the same language they are using.

Once you understand the request you MUST: """ ... etc...

freediver 9 months ago

I am a big fan!

kridsdale3 9 months ago

Can those kinds of things work in monorepos with 50 million files?

[-]

syntaxing 9 months ago

They use this thing called repo map[1]. I only used it for personal projects and it’s been great. You need to add the files you care about yourself, it’ll do its best and add additional files from the repo map if needed.

Since it’s git based, it makes it very easy to keep track of the LLMs output. The agents is really well done too. I like to skip auto commit so I can “git reset —hard HEAD^1” if needed but aider has built in “undo” command too.

[1] https://aider.chat/docs/repomap.html

[-]

cdchn 9 months ago

Thats a cool idea, kind of reminds me of ctags.

[-]

imjonse 9 months ago

Aider had actually used ctags to implement that feature before they switched to tree-sitter.

ctoth 9 months ago

Can you work in a repo with fifty million files? Can Git? I just checked on my Windows machine using Everything and there are 15,960,619 files total including every executable, image, datafile, &c.

Out of curiosity what does your IDE do when you do a global symbol rename in a repository with fifty million files?

I'm absolutely a real human, and I think this just might be too much context for me! Perhaps I am not general enough.

[-]

achierius 9 months ago

I thought this was common knowledge but I guess not: Google's monopoly famously has over a billion files. No, Git cannot handle it. Their whole software stack is developed around this from the ground up. But they are one of the largest software employers in the world, so quite a few engineers evidently do make do with 200x more than 50 million files.

[-]

nasmorn 9 months ago

Monopoly <> Monorepo This is the funniest typo possible in the context of google

dexwiz 9 months ago

Having worked on a codebase like that, you need to use some extra plugins to get git to work. And even then, it’s very slow. Like 15-30 seconds for a git status to run even with caching. Global renames with an IDE are impossible but tools like sed and grep still work well. Usually there is a module system that maps to the org structure and you don’t venture outside of your modules or dependency modules very often.

adamtaylor_13 9 months ago

No and neither can you. Like you, it works best with small, focused context.

These tools aren’t magic. But they do certain tasks remarkably well.

[-]

dartos 9 months ago

> No and neither can you.

People do work on monorepos with 50 million+ files, though…

paradite 9 months ago

I made a tool that allows you to use LLMs on large codebases. You can select which files are relevant and embed them into the prompt: https://prompt.16x.engineer/

Based on my personal experience it works well as long as each file is not too long.

salviati 9 months ago

I believe they can as long as you're able to identify a contained task that touches no more than a handful of files. Still very useful to automate some tedious work or refactoring if you ask me.

[-]

jprete 9 months ago

That's effectively an answer of "no".

[-]

sthatipamala 9 months ago

I used to work in a monorepo of that size.

All of the PRs I ever submitted touched a handful of files in my project’s subdirectory.

raincole 9 months ago

That's effectively an answer of "yes".

Or what "yes" looks like to you? It can do all the work itself, for a 50m-file monorepo, without a human guiding it which files to look at?

If it were true then human programmers would have been considered obsoleted today. There would be exactly zero human programmers who make any money in 2025.

9 months ago

[deleted]

rorytbyrne 9 months ago

It doesn't take the whole repo as context, it tries to guess which files to look at. So if you prompt with that in mind, it works well. Haven't tried it on a very large codebase though. You can also explicitly add files, if you know where work should be done.

ziggyzecat 9 months ago

you can make it work. just think of the many approaches and you'll see that there are actually quite many viable ways to work around pseudo-infinite context.

Volrath89 9 months ago

You are right, except on the part about tweaking the prompt to get your desired code styling.

The easier way to integrate into an existing code base is just to refactor the code yourself. AI gives a working version, you refactor and move on. For me this has been a huge productivity boost from writing everything from scratch

inciampati 9 months ago

Exactly. Use aider to do this.

Tell claude or your favorite LLM to write a full plan to implement what you need in such a way that your coworker can implement it.

Copy the result into aider, and check the results!

richardw 9 months ago

Drag in a couple files and say “make a class that does X but use this format”. I absolutely don’t rely on it for lots of things but it’s absolutely capable of working with existing code. Claude is far better than OpenAI when dealing with sets of existing files. I also like that it’s outside of my IDE so I make the final changes. LLM’s love to just write tokens so I keep a fairly short leash.

paradite 9 months ago

I built a tool specifically for integrating LLMs like Claude into existing codebase and daily coding workflow: https://prompt.16x.engineer/

beefnugs 9 months ago

This isn't for professional software developers. This is for managers, so they can shit out a one week test and then feel superior enough to pay software engineers less for the "easy" job they do.

codingwagie 9 months ago

cursor.sh, add context to the prompt

[-]

rty32 9 months ago

I haven't spent enough time with cursor specifically, but with other similar coding assistants, adding context can take some time. And often, even if it does save time, the action of "adding context" itself gets tiring and tedious so that you don't want to bother but instead just write the code yourself. It's about mental affordability.

hackernewds 9 months ago

You spend 40-100 hours/wk at work. Ok to spend 2 hours to give Claude the info around those conventions. It should then save you 20-40 hours/wk

Atotalnoob 9 months ago

Just tell it those conventions or standards.

Using GitHub copilot, I just tell it to style its code like an example and it gets pretty close.

bboygravity 9 months ago

You can feed it a bunch of your code style as part of a project and it will just adhere to that.

M4v3R 9 months ago

It's funny how we went from "it's impossible for a computer to write meaningful code by itself" to "yawn, another one of these" in like 2 years.

[-]

HaZeust 9 months ago

I said this last year[1] and still FIRMLY believe it:

"It's even crazier to me that we've just... Accepted it, and are in the process of taking it for granted. This type of technology was a moonshot 2 years ago, and many experts didn't expect it in the lifetimes of ANYONE here - and who knew the answer was increasing transformers and iterating attention?

And golly, there are a LOT of nay-sayers of the industry. I've even heard some folks on podcasts and forums saying this will be as short-lived and as meaningless as NFTs. NFTs couldn't re-write my entire Python codebase into Go, NFTs weren't ever close to passing the bar or MCAT. This stuff is crazy!"

1 - https://news.ycombinator.com/item?id=37879730

[-]

cloogshicer 9 months ago

> NFTs couldn't re-write my entire Python codebase into Go

Neither can LLMs. They can produce output that looks like a plausible re-write of your codebase, but on closer inspection turns out to have many minor and major errors everywhere.

The problem is that the closer inspection part is very often more work than writing the code by hand in the first place.

There hasn't been enough evidence for me that this will be possible to fix.

[-]

devjab 9 months ago

I disagree with you on this. If you go through my history on LLM’s you’ll see that I didn’t consider them more than fancy auto-complete. I still think of it mainly as fancy auto-compete for a lot of things, but we’ve begun using Claude in our porting of our C to Rust. Claude does it really, really, well. You have to look it over, but it’s far more efficient than any one of us can without the assistance. I don’t have the exact numbers but we’re close to a 90% accuracy on what is accepted without corrections.

We follow a YAGNI approach to our code architecture and abstractions, meaning it’s very straight forward with things happening where they are written and not in 9 million places like Clean Code lovers try to do. Our C services and Libraries are also fairly small and “one purpose”. I’m not sure you would be wrong on larger code bases, at least not right now.

With what we see Claude do now though, I don’t think we’re far from a world where Software Developers are going to do significantly different work. I also think quite a lot of the stuff we do today will no longer exist.

HaZeust 9 months ago

I've used GPT-4 to do what I had said. I pasted the errors I was given, and did so for 2-3 more iterations, and it successfully ported critical in-house infrastructure from Python 3 to Go.

[-]

tharant 9 months ago

I feel like I’ve been gaslit by the entire GenAI industry that I’m just bad at prompt engineering. When I’m talking to an LLM about stuff unrelated to code-generation, I can get sane and reasonable responses—engaging and useful even. The same goes for image generation and even the bit of video generation I’ve tried. For me however, getting any of these models to produce reasonably sane code has proven elusive. Claude is a bit better than others IME but I can’t even get it to describe a usable project template and directory structure for anything other than very simple Scala, Java, or Python projects. The code I’m able to generate always needs dramatic and manual changes; even trying to get a model to refactor a method in the code it wrote within the current context window results in bugs and broken business logic. I dearly wish I knew how others are able to accomplish things like “it successfully ported critical in-house infrastructure from Python 3 to Go.”. To date, I’ve seen no actual evidence (aside from what are purported to be LLM-generated artifacts) that anything beyond generating (or RAG-ing existing code) is even possible. What am I missing? Is it unrealistic for me to assume that prompt engineering such a seemingly dramatic LLM-generated code rewrite is something that I could learn by example from others? If not, can somebody recommend resources related to learning how to accomplish non-trivial code generation?

[-]

HaZeust 9 months ago

> If not, can somebody recommend resources related to learning how to accomplish non-trivial code generation?

Learn how to think ontologically and break down your requests first by what you're TRULY looking for, and then understand what parts would need to be defined in order to build that system -- that "whole". Here's some guides:

1.) https://platform.openai.com/docs/guides/prompt-engineering 2.) https://www.promptingguide.ai/

[-]

tharant 9 months ago

Thank you for the links!

> Learn how to think ontologically and break down your requests first by what you're TRULY looking for, and then understand what parts would need to be defined in order to build that system -- that "whole".

Since I’m dealing with models rather than other engineers should I expect the process of breaking down the problem to be dramatically different from that of writing design documents or API specs? I rarely have difficulty prompting (or creating useful system prompts for) models when chatting or doing RAG work with plain English docs but once I try to get coherent code from a model things fall apart pretty quickly.

[-]

HaZeust 9 months ago

That's actually a solid question! You can probably ask GPT to AI-optimize a standard technical spec you have and to "ask clarifying questions in order to optimize for the best output". I've done that several times with past specs I've had and it was quite a fruitful process!

[-]

tharant 9 months ago

Great idea. I’ve used that tactic in the past for non-code related prompts; not sure why I didn’t think of trying it with my code-generation prompting. I’ll give it a shot.

[-]

hackernewds 9 months ago

the "ask me what info you're missing" strategy works very well, since the AI will usually start the task every time to avoid false positives of asking a question. and then it also asks very good questions, I then realize were necessary info

devjab 9 months ago

> usable project template and directory structure

This caught my eye and I’m genuinely curious about what you mean by it. Part of our success with Claude is that we don’t do abstractions, “perfect architecture”, DRY, SOLID and other religions that were written by people who sell consulting in their principles. If we ask LLMs to do any form of “Clean Code” or give them input on how we want the structure, they tend to be bad it.

Hell, if you want to “build from the bottom” you’re going to have to do it over several prompts. I had Claude build a blood bowl game for me, for the fun of it. It took maybe 50 prompts. Each focusing on different aspects. Like, I wanted it to draw the field and add mouse clickable and movable objects with SDL2, and that was one prompt. Then you feed it your code in a new prompt and let it do the next step based on what you have. If the code it outputs is bad, you’ll need to abandon the prompt again.

It’s nothing like getting an actual developer to do things. They can think for themselves and the probability engine won’t do any of that even if it pretends to. Their history for building things from scratch also seems to be quickly “tarnished” within the prompt context. Once they’ve done the original tasks I find it hard to get them to continue on it.

[-]

tharant 9 months ago

> This caught my eye and I’m genuinely curious about what you mean by it. Part of our success with Claude is that we don’t do abstractions, “perfect architecture”, DRY, SOLID and other religions

Within my environment, some of those “religions” are more than a requirement; they’re also critical to the long-term maintenance of a large collection of active repositories.

I think one of the problems folks tend to have with following or implementing a “religion” (by which I mean specific structural and/or stylistic patterns within a codebase) comes down to a fear of being stuck forever with a given pattern that may not fit future needs. There’s nothing wrong with iterating on your religion’s patterns as long as you have good documentation with thorough change logs; granted, that can be difficult or even out of reach for smaller shops.

[-]

devjab 9 months ago

My personal problem with them is that after decades in enterprise software I’ve never seen them be beneficial to long-term maintenance. People like Uncle Bob (who haven’t actually worked in software engineering since 20 years before Python was invented) will respond to that sort of criticism with a “they misunderstood the principles”. Which is completely correct in many cases, but if so many people around the world misunderstand the principles then maybe the principles simply aren’t good?

I don’t think any of them are inherently bad, but they lead to software engineering where people over complicate things. Building abstractions they might never need. I’ve specialised in the field of taking startups into enterprise, and 90% of the work is removing the complexity which has made their software development teams incapable of delivering value in a timely manner. Some of this is because they build infrastructures as though they were Netflix or Google, but a lot of times it’s because they’ve followed Clean Code principles religiously. Abstractions aren’t always bad, but you should never abstract until you can’t avoid it. Because two years down into your development you’ll end up with code bases that are so complex that it makes them hard to work with.

Especially when you get the principles wrong. Which many people do. Over all though, we’ve had 20 years of Clean Code, SOLID, DRY and so on, and if you look at our industry today, there is no less of a mess in software engineering than there were before. In fact some systems still run on completely crazy Fortran or COBOL because nobody using “modern” software engineering have been capable of replacing them. At least that’s the story in Denmark, and it hasn’t been for a lock of trying.

I think the main reason many of these principles have become religions is because they’ve created an entire industry of pseudo-jobbers who manage them, work as consultants and what not. All people who are very good at marketing their bullshit, but also people who have almost no experience actually working with code.

Like I said, nothing about them are inherently bad. If you know when to use which parts, but almost nobody does. So to me the only relevant principle is YAGNI. If you’re going to end up with a mess of a code base anyway, you might as well keep it simple and easy to change. I say this as someone who works as an external examiner for CS students, where we still teach all these things that so often never work. In fact a lot of these principles were things I was thought when I took my degree, and many haven’t really undergone any meaningful changes with the lessons learned since their initial creation.

[-]

tharant 9 months ago

I appreciate your perspective and I don’t disagree with you entirely. I’ve worked in environments that struggle with putting religion before productivity and maintainability; the result is often painful. I’ve also worked in environments where religion, productivity, and maintainability are equals; it makes for a nice working environment. Perhaps there’s a bit more bureaucracy involved (forced documentation can be frustrating—especially when you realize your docs don’t match the spec or the even the code) but, in my experience, the outcome is more pleasant. Scaling religious requirements while maintaining productivity can be tricky though; religion can be deeply expensive (and therefore bad business) for smaller orgs, but it can also be easily politicized in larger orgs, which often results in engineer dissatisfaction. Religion will always be controversial. :)

malfist 9 months ago

I think it's level of expertise. You are an expert in coding (10,000 hours and all that) so you know when the code is wrong. Everything else you put into it and get plausible sounding response is just as incorrect as the plausible sounding responses to coding questions, just you know enough to spot the errors.

LLMs are insidious, it feeds into "everything is simple" concept a lot of us have of the world. We ask an LLM for a project plan and it looks so good we're willing to fire our TPM, or a TPM asks the LLM for code and it gives them code that looks so good they question the value of an engineer. In reality, the LLM cannot do either role's job well.

[-]

tharant 9 months ago

> You are an expert in coding (10,000 hours and all that) so you know when the code is wrong.

While I appreciate the suggestion that I might be an expert, I am decidedly not. That said, I’ve been writing what the companies I’ve worked for would consider “mission critical” code (mostly Java/Scala, Python, and SQL) for about twenty years, I’ve been a Unix/Linux sysadmin for over thirty years, and I’ve been in IT for almost forty years.

Perhaps the modernity and/or popularity of the languages are my problem? Are the models going to produce better code if I target “modern” languages like Go/Rust, and the various HTML/JS/FE frameworks instead of “legacy” languages like Java or SQL?

Or maybe my experience is too close to bare metal and need to focus on more trivial projects with higher-level or more modern languages? (fwiw, I don’t actually consider Go/Rust/JS/etc to be higher-level or more “modern” languages than the JVM languages with which I’m experienced; I’m open to arguments though)

> LLMs are insidious, it feeds into "everything is simple" concept a lot of us have of the world.

Yah, that’s what I mean when I say I feel gaslit.

> In reality, the LLM cannot do either role's job well.

I am aware of this. I’m not looking for an agent. That said, am I being too simplistic or unreasonable in expecting that I too could leverage these models (albeit perhaps after acquiring some missing piece of knowledge) as assistants capable of reasoning about my code or even the code they generate? If so, how are others able to get LLMs to generate what they claim are “deployable” non-trivial projects or refactorings of entire “critical” projects from the Python language to Go? Is someone lying or do I just need (seemingly dramatically) deeper knowledge of how to “correctly” prompt the models? Have I simply been victim of (again, seemingly dramatically) overly optimistic marketing hype?

[-]

9 months ago

[deleted]

vessenes 9 months ago

We have a similar amount of IT experience, although I haven't been a daily engineer for a long time. I use aider.chat extensively for fun projects, preferring the Claude backend right now, and it definitely works. This site is 90% aider, give or take, the rest my hand edits: https://beta.personacollective.ai -- and it involves solidity, react, typescript and go.

Claude does benefit from some architectural direction. I think it's better at extending than creating from whole-cloth. My workflow looks like:

1) Rough out some code, say a smart contract with the key features

2) Tell claude to finish it and write extensive testing.

3) Run abigen on the solidity to get a go library

4) Tell claude to stub out golang server event handlers for every event in the go library

5) Create a react typescript site myself with a basic page

6) Tell claude to create an admin endpoint on the react site that pulls relevant data from the smart contracts into the react site.

6.5) Tell claude to redesign the site in a preferred style.

7) Go through and inspect the code for bugs. There will be a bunch.

8) For bugs that are simple, prompt Claude to fix: "You forgot x,y,z in these files. fix it."

9) For bugs that are a misunderstanding of my intent, either code up the core loop directly that's needed, or negotiate and explain. Coding is generally faster. Then say "I've fixed the code to work how it should, update X, Y, Z interfaces / etc."

10) for really difficult bugs or places I'm stumped, tar the codebase up, go to the chat interface of claude and gpto1-preview, paste the codebase in (claude can take a longer paste, but preview is better at holistic bugfixing), and explain the problem. Wait a minute or two and read the comments. 95% of the time one of the two LLMS is correct.

This all pretty much works. For these definitions of works:

1) It needs handholding to maintain a codebase's style and naming.

2) It can be overeager: "While I was in that file, I ..."

3) If it's more familiar with an old version of a library you will be constantly fighting it to use a new API.

How I would describe my experience: a year ago; it was like working with a junior dev that didn't know much and would constantly get things wrong. It is currently like working with a B+ senior-ish dev. It will still get things wrong, but things mostly compile, it can follow along, and it can generate new things to spec if those requests are reasonable.

All that to say, my coding projects went from "code with pair coder / puppy occasionally inserting helpful things" to "most of my time is spent at the architect level of the project, occasionally up to CTO, occasionally down to dev."

Is it worth it? If I had a day job writing mission critical code, I think I'd be verrry cautious right now, but if that job involved a lot of repetition and boiler plate / API integration, I would use it in a HEARTBEAT. It's so good at that stuff. For someone like me who is like "please extend my capacity and speed me up" it's amazing. I'd say I'm roughly 5-8x more productive. I love it.

[-]

tharant 9 months ago

This is very good insight, the likes of which I’ve needed; thank you. Your workflow is moderately more complex and definitely less “agentic” than I’d expected/hoped but it’s absolutely not out of line with the kind of complexity I’m willing to tackle nor what I’d personally expect from pairing with or instructing a knowledgeable junior-to-mid level developer/engineer.

[-]

vessenes 9 months ago

Totally. It’s actually an interesting philosophical question: how much can we expect at different levels of precision in requirements, and when is code itself the most efficient way to be precise? I definitely feel my communication limits more with this workflow, and often feel like “well, that’s a fair, totally wrong, but fair interpretation.”

Claude has the added benefit that you can yell at it, and it won’t hold it against you. You know, speaking of pairing with a junior dev.

[-]

tharant 9 months ago

> Claude has the added benefit that you can yell at it, and it won’t hold it against you.

Yet.

I don’t look forward to the day these models are trained on all the context we’ve fed their predecessors; if AGI is possible, it’s gonna hate us. :)

namanyayg 9 months ago

Replace all this with Cursor, chat to Claude inside the project directory and talk to multiple files at once

It can also index docs pages of newer APIs and/or search the web to find latest info of newer libraries, so you won't struggle with issue #3

[-]

vessenes 9 months ago

Agreed cursor is good to very good, I’m just extremely tied to my old man vi workflow.

worthless-trash 9 months ago

You and me both man, Either I'm speaking a different language or I'm simply really bad at explaining what I need. I'd love to see someone actually do this on video.

[-]

tharant 9 months ago

Indeed. I’ve yet to run across an actual demonstration of an LLM that can produce useful, non-trivial code. I’m not suggesting (yet) that the capabilities don’t exist or that everyone is lying—the web is a big place after all and finding things can be difficult—but I am slowly losing faith in the capability of what the industry is selling. It seems right now one must be deeply knowledgeable of and specialized in the ML/AI/NLP space before being capable of doing anything remotely useful with LLM-based code generation.

[-]

grbsh 9 months ago

I think there is something deeper going on: “coding” is actually 2 activities: the act of implementing a solution, and the act of discovering the solution itself. Most programmers are used to doing both at once. But to code effectively with an LLM, you need to have already discovered the solution before you attempt to implement it!

I’ve found this to be the difference between writing 50+ prompts / back and for the to get something useful, and when I can get something useful in 1-3 prompts. If you look at Simon’s post, you’ll see that these are all self-contained tools, whose entire scope has been constrained from the outset of the project.

When you go into a large codebase and have to change some behavior, 1) you usually don’t have the detailed solution articulated in your mind before looking at the codebase. 2) That “solution” likely consists of a large number of small decisions / judgements. It’s fundamentally difficult to encode a large number of nuanced details in a concise prompt, making it not worth it to use LLMs.

On the other hand, I built this tool: https://github.com/gr-b/jsonltui that I now use every day almost entirely using Claude. “CLI tool to visualize JSONL with textual interface, localizing parsing errors” almost fully qualifies this. In contrast, my last 8 line PR at my company, while it would appear much simpler on the surface level, contains many more decisions, not just of my own, but reflecting team conversations and expectations that are not written down anywhere. To communicate this shared implicit context with Claude would be so much more difficult than to perform the change myself.

simonw 9 months ago

I think https://tools.simonwillison.net/openai-audio is useful and non-trivial.

[-]

tharant 9 months ago

You’re probably right but I’m far more interested in seeing things like how you prompted the model to produce your audio tool’s code. Did you have a design doc or did you collaborate with the model to come up with a design and its implementation ad-hoc? How much manual rewriting did you do. How much worked with little to no editing? How much did you prompt the model to fix any bugs it created? How successful was it? Did you specify a style guide up front or just use what it spat out and try to refactor later? How did that part go? You see where I’m going?

Oh, wow, it honestly just occurred to me that examples of how to prompt a model to produce a certain kind of content might be considered, more or less, some kind of trade secret vaguely akin to a secret recipe. That would be a bit depressing but I get it.

[-]

simonw 9 months ago

Here are the full Claude transcripts I used to build the OpenAI Audio app:

- https://gist.github.com/simonw/0a4b826d6d32e4640d67c6319c7ec... - most of the work

- https://gist.github.com/simonw/a04b844a5e8b01cecd28787ed375e... - some tweaks

Lots more details in my full post about it here: https://simonwillison.net/2024/Oct/18/openai-audio/

fragmede 9 months ago

details at: https://simonwillison.net/2024/Oct/18/openai-audio/

galaxyLogic 9 months ago

Sounds a bit like how Agile used to be. If it's not working, you're not doing it right.

joquarky 9 months ago

I find it to be very useful for functional programming since the limited scope aligns with the limited LLM context.

[-]

tharant 9 months ago

Assuming you mean the paradigm often known as FP (which makes use of concepts from the Lambda Calculus and Category Theory) and languages like Scala and Haskell that support Pure FP, well… my experience in trying to get LLMs to generate non-trivial FP (regardless the purity) has been entirely useless. I’d love to see an example of how you’re able to get useful code that is non-trivial—by which I mean code that includes useful business logic instead of what’s found in your typical “Getting Started” tutorial.

[-]

galaxyLogic 9 months ago

That's probabaly because AI has read all those "Getting Started" -tutorials.

fivestones 9 months ago

Here’s my experience. Like some of the other responses here to your comment, nothing I’ve made that’s more than a few lines of code has worked after one prompt, or even two or three. An example of something I’m working at the moment is here: https://github.com/fivestones/family-organizer. That codebase is about 99% LLM generated. I’d say it’s 60% from chatgpt 4o, 30% Claude Sonnet 3.5, and the rest mostly chatgpt o1-preview. Just the last commit has a bit of Claude Sonnet 3.5-new. I can send you my chat transcripts if it would be helpful but it would take some work since it’s scattered over lots of different conversations. At the beginning I was trying to describe the whole project to the LLM and then ask it to implement one feature. After maybe 5-20 prompts and iterations back and forth, I’d have something I was happy with for that feature and would move on to the next. However, I found, like some others here, the model would get bogged down in mistakes it had made previously, or would forget what I told it originally, or just wouldn’t work as well the longer my conversation went. So what I switched to, which seems to work really well, is to just paste in my entire current codebase (or at least all the relevant files) into a fresh chat, and then tell it about the one new feature I wanted. I try to focus on adding new features, or on fixing a specific problem. I’ll then sometimes (especially for a new feature) explain that this is my current code, here is the new thing I’m wanting it to do, and then tell it not to write any code for me but instead to ask me any questions it has. After this I’ll answer all its questions and tell it to ask me any follow up questions it has. “If you don’t have any more questions just say “I’m ready”. When it gets to the point of saying “I’m ready”, if working with chatgpt I would change the model from 4o to o1-preview, and then just say, “ok, go ahead”. After it spits out its response, it usually takes some iteration in the same chat: me copying and pasting code into vs code, running it, copy pasting any errors back to the LLM, or describing to it what I didn’t like about the results, and repeating. I might go through that process 5-10 times for something small, or 20-25 times for something bigger. Once I’ve gotten something working, I’ll abandon that chat and start over in a new one with my next problem or desired feature. I basically have done nothing at all with telling it anything about how I want it to structure the code. For the project above I wanted to use instantdb so I fed it some of the instantdb documentation and examples at the beginning. Later features just worked—it followed along successfully with what it saw in my codebase already. I am also using typescript/next.js and so those were pretty much the limits of what I’ve told it as constraints as to how to structure the code. I’m not a programmer, and I think if you look at the code you’ll probably see lots of stuff that looks bad to you if you are a programmer. But I don’t have plans to reply this code at scale—it’s just something I’m making for my family to use, and for whoever finds it on GitHub to use as well. So as long as it works and I’m happy with the result I’m not too concerned about the code. The most concern I have might be things like thinking about future features I want to add and whether or not the code I’m adding now will make future code hard to add or not. Usually I’ll just tell the LLM something like, “keep in mind when making this db schema that later we’ll need to do x or y”, and leaving it at that. The other thing is that I’ve never used react let alone next.js and have only dabbled in js here and there. But here I am, making something that works and that I’m happy with, thanks to the LLMs. I think that’s pretty amazing to me. Sometimes I struggle to get it to do what I want, and usually then I just scrap the latest code changes back to the last commit and then start over, often with a different LLM model. It sounds like your use case is a lot different than mine, as I’m just doing stuff in my spare time for fun and for me or my family to use. But maybe some of those ideas will help you. Let me know if you want some chat transcripts. One other thing, I found a vs code extension that lets me choose a file or set of files in the vs code explorer, right click, and export for LLM consumption. This is really helpful. It just makes a tree of whichever files I had selected (like the output from the terminal tree command) and follow that with the full text of each file, and copies all that to the clipboard. So to start a new chat, I just select files, right click, export for LLM, and then paste into the LLM new char window.

swat535 9 months ago

GPT-4 has no understanding of logic what-so-ever, let's stop pretending it does.

If it gives you a solution that is wrong, you have to point it at, then it will give you a second version , if that is also wrong, it will then slightly modify the same solutions over and over again instead of actually fixing the issue.

It gets stuck in a loop of giving you 2-3 versions of the same solution with the slightly different outputs.

It's only useful for boilerplate code and even then, you have to clean it up..

[-]

GaggiX 9 months ago

Then you should try Claude, I have never seen it get stuck in a loop, at some point it would just rewrite everything if it came to that.

ants_everywhere 9 months ago

GPT-4 is pretty bad at generating Python. It kind of works as well as combining 2-3 stack overflow questions, but it can't tell that the combination is sane.

I mostly agree with what the others are saying. It can generate boilerplate and it can generate simple API calls when there are lots of examples in the training set.

Generating Go is probably easier because at least you get compiler feedback.

Right now the only place it saves me time are with languages I don't know at all and with languages like Bash and SQL where I just can't bring myself to care enough to remember the long tail of more esoteric points that I don't use every day.

fhdsgbbcaA 9 months ago

That just means the bugs are so subtle you haven’t found them yet, they are there and unspooling the damage may be very painful.

[-]

HaZeust 9 months ago

That's rather assuming of you, they're there no less than they would be for a human's programming - and VERY likely no more.

[-]

kuhewa 9 months ago

But one is trying to write good-enough code. The other is trying to write good-enough-looking code. The probability of pain arising from the bugs of the latter is probably greater.

[-]

HaZeust 9 months ago

I'd actually love to see a benchmark on this - we're just speculating now.

[-]

kuhewa 9 months ago

The work demonstrating the Frankfurtian Bullshit nature of generated prose would suggest as much, given the architecture is the same for code outputs it seems like a fair assumption until it is demonstrated otherwise.

tharant 9 months ago

> they're there no less than they would be for a human's programming - and VERY likely no more.

This is VERY different from my own experience. The bugs introduced by the code I’ve tried to generate via LLMs (Mostly Claude, some GPT-4o and o1-preview, and lots of one-off fiddling with local models to see if they’re any better/worse than commercial products) are considerably more numerous (and often more subtle) than what my fellow engineers—juniors included—tend to introduce.

I /want/ these tools to be useful; they haven’t been so far though and I’m kinda stuck on understanding if I’m just not using ‘em right or if they’re even capable of what I want to do. Like I said in a previous comment; I don’t know if I’m being gaslit or if I’m being naive but it feels a lot more like gaslighting.

IshKebab 9 months ago

I have also tried to do this and it didn't work as smoothly as you claim.

I don't think either of you are wrong; it just heavily depends on the complexity of the app and how familiar LLMs are with it.

E.g. rewriting a web scraper, CRUD backend or a build script? Sure, maybe. Rewriting a bootloader, compiler or GUI app? No chance.

[-]

josephg 9 months ago

Its funny seeing the goalposts move in real time.

"Yes, AI can make human sounding sentences, but can it play chess?"

"Well yes, it can play chess. But no computer can beat a human grandmaster at chess."

"Well it beat Kasperov - but it has no hope of beating a human at Go."

"Its funny - it can beat humans at go but still can't speak as well as a toddler."

"Alright it can write simple problems, but it introduces bugs in anything nontrivial, and it can't fix those bugs!"

I write bugs in anything nontrivial too! My human advantages are currently that I'm better at handling a large context, and I can iterate better than the computer can.

But - seriously, do you think innovation will stop here? Did the improvements ever stop? It seems like a pretty trivial engineering problem to hook an AI up to a compiler / runtime so it can iterate just like we can. Anthropic is clearly already starting to try that.

I agree with you, today. I used claude to help translate some rust code into typescript. I needed to go through the output with a fine toothed comb to fix a lot of obvious bugs and clean up the output. But the improvement over what was possible with GPT3.5 is totally insane.

At the current rate of change, I give it 5-10 years before we can ask chatgpt to make a working compiler from scratch for a novel language.

[-]

simonw 9 months ago

You may appreciate this quote about constantly moving the goalposts for AI:

"There is superstition about creativity, and for that matter, about thinking in every sense, and it's part of the history of the field of artificial intelligence that every time somebody figured out how to make a computer do something - play good checkers, solve simple but relatively informal problems - there was a chorus of critics to say, but that's not thinking."

That's from 1979! https://simonwillison.net/2024/Sep/13/pamela-mccorduck-in-19...

[-]

zahlman 9 months ago

I side with Roger Penrose on this one. I'm still not convinced it's "thinking", and don't expect I ever will be, any more than a book titled "I am Thinking" would convince me that it's thinking.

[-]

budgi4 9 months ago

Separate thinking from conscious. I.e. We have built machines which are processing data similar to our thinking process. They are not conscious.

[-]

zahlman 9 months ago

My point is that I don't accept the concept of unconscious thought. "Processing data similar to our thinking process" doesn't make it "thinking" to me, even if it comes to identical conclusions - just like it wouldn't be "thinking" to just read off a pre-recorded answer.

The idea of ChatGPT being asked to "think" just reminds me of Pozzo from Waiting for Godot.

IshKebab 9 months ago

Well you can't have a conversation with a book... I don't understand your comment.

> I'm still not convinced birds can fly any more than a rock shaped like a bird would convince me that it's flying.

okwhatnow3773 9 months ago

I agree. Some people think Google is sentient I guess? Data retrieval and mangling is not all we do, luckily.

josephg 9 months ago

Why do you care if its thinking or not?

[-]

zahlman 9 months ago

I don't, in and of itself. I care that other people think that passing increasingly complicated tests of this sort is equivalent to greater proof of such "thought", and that the nay-sayers are "moving the goalposts" by proposing harder tests.

I don't propose harder tests myself, because it doesn't make sense within my philosophy about this. When those tests are passed, to me it doesn't prove that the AI proponents are right about their systems being intelligent; it proves that the test-setters were wrong about what intelligence entails.

[-]

josephg 9 months ago

> ... passing increasingly complicated tests of this sort is equivalent to greater proof of such "thought",

Nobody made any claim in this thread that modern AIs have thoughts.

What these (increasingly complicated) tests do is demonstrate the capacity to act intelligently. Ie, make choices which are aligned with some goal or reward function. Win at chess. Produce outputs indistinguishable from the training data. Whatever.

But you're right - I'm smuggling in a certain idea of what intelligence is. Something like: Intelligence is the capacity to select actions (outputs) which maximise an externally defined given reward function over time. (See also AIXI: https://en.wikipedia.org/wiki/AIXI ).

> When those tests are passed, [..] to me it proves that the test-setters were wrong about what intelligence entails.

It might be helpful for you to define your terms if you're going to make claims like that. What does intelligence mean to you then? My best guess from your comment is something like "intelligence is whatever makes humans special". Which sounds like a useless definition to me.

Why does it matter if an AI has thoughts? AI based systems, from MNIST solvers to deep blue to chatgpt have clearly gotten better at something. Whatever that something is, is very very interesting.

[-]

zahlman 9 months ago

>But you're right - I'm smuggling in a certain idea of what intelligence is.

Yes, you understand me. I simply come in with a different idea.

>AI based systems, from MNIST solvers to deep blue to chatgpt have clearly gotten better at something. Whatever that something is, is very very interesting.

Certainly the fact that the outputs look the way they do, is interesting. It strongly suggests that our models of how neurons work are not only accurate, but creating simulations according to those models has surprisingly useful applications (until something goes wrong. Of course, humans also have an error rate, but human errors still seem fundamentally different in kind.)

[-]

josephg 9 months ago

Modern neural networks have very little to do with their biological cousins. It makes a cute story, but it’s over claimed. Transformers and convolution kernels think in very different ways than the human mind.

[-]

zahlman 9 months ago

That gives me less reason to accept that it qualifies as "thinking".

[-]

josephg 9 months ago

Again, I don’t know of anyone, here or elsewhere who claims chatgpt thinks, in the way we understand it in humans. I think our intuitions largely agree.

[-]

zahlman 9 months ago

... Then why did I get so much pushback in this comment chain?

Eisenstein 9 months ago

Is there anything that a non-human could do that would cause you to accept that it was thinking?

[-]

zahlman 9 months ago

Of course. Animals demonstrate sapience, agency and will all the time.

[-]

Eisenstein 9 months ago

So, if a machine demonstrated sapience, agency, and will, then you would grant that it could think?

[-]

zahlman 9 months ago

Yes; but if you showed me a machine that you believed to be doing those things, given my current model, I wouldn't agree with you that it was.

[-]

Eisenstein 9 months ago

You are saying that even if it did the same thing that animals do that you attribute to thinking, you would refuse to acknowledge it could be thinking?

Is there something particularly unique about biological circuits that allow thought, as opposed to electronic ones?

[-]

zahlman 9 months ago

I believe so, yes. No, I can't explain what it is. (Because I think they're obvious follow-up questions: No, I don't consider myself particularly religious. Yes, I do believe in free will.)

[-]

josephg 9 months ago

… But you believe there’s something special about intelligence grounded in biology that can’t be true of intelligence grounded in silicon? That just sounds like magical thinking to me.

[-]

IshKebab 9 months ago

I agree. Thinking is clearly a compositional process and computers are Turing complete so it seems like and impossibility to me. Unless you reach for some quantum microtubule woo...

galaxyLogic 9 months ago

> At the current rate of change, ...

We've seen that the rate of change went up hugely when LLMs came around. But the rate of change was much lower before that. It could also be much slower for the foreseeable future.

LLMs are only as good as their training materials. But a lot of what programmers do is not documented anywhere, it happens in their head, and it is in response to what they see around them, not in what they scrape from the web or books.

Maybe what is needed is for organizations to start producing materials for AI to learn from, rather than assuming that all they need is what they find on the web? How much of the effort to "train" AI is just letting them consume the web, and how much is concsiously trying create new learning materials for AI?

[-]

josephg 9 months ago

It could slow down again. We don’t know. But the people working at OpenAI seem to believe the models will keep improving for the foreseeable future. The “we’ll run out of training data” argument seems overblown.

danparsonson 9 months ago

> Its funny seeing the goalposts move in real time.

Another way to look at it is that we're refining our understanding of the capabilities of machine learning in real time. Otherwise one could make basically the same argument about any field that progresses - take our theories of gravity for example. Was Einstein moving the goalposts? Or was he building on previous work to ask deeper questions?

Set against the backdrop of extraordinary claims about the abilities of LLMs, I don't think it's unreasonable to continue pushing for evidence.

IshKebab 9 months ago

Yeah I totally agree with you. Lots of goalpost moving, and it is absolutely insane what it can do today and it will only improve.

It just can't translate the kinds of programs I write between languages on its own. Today.

okwhatnow3773 9 months ago

Indeed, the constant goal shifting is tiresome.

I mean, we first put up a ladder and we could reach the peaches! Next, we put a ladder next to the apple tree and we could pluck those. Now, in their incessant goal post moving people said, great, now setup a ladder to the moon. There is no reason to assume this won’t work. None at all. People are just complaining and being angry at losing their fancy jobs.

More specific: it cannot learn, because it has no concept of learning from first principles. There is no way out, not even a theoretical one.

wrtasfg 9 months ago

Of course it can stop, once legislation catches up and forbids IP theft using a thinly disguised probabilistic and compressed database of other people's code.

[-]

edouard-harris 9 months ago

> a thinly disguised probabilistic and compressed database of other people's code

Speaking as a software engineer, I feel seen.

josephg 9 months ago

You really think those laws are coming? That the US and Chinese governments will force AI companies to put the genie back in the bottle?

I think you're going to be very disappointed.

VirusNewbie 9 months ago

But how do you know those were the only errors?

[-]

HaZeust 9 months ago

What's this question even mean? Because they're the only ones that came up in the debugger portion of the IDE, the output serves the intended purposes, the logging and error handling that I wanted to include were in the initial write-up prompt, and I could read the code it wrote because I partially knew the outputted language - and when I wasn't sure of a line, I asked it for clarification and a source from a reputable knowledgebase of the language, and GPT provided it?

[-]

majormajor 9 months ago

I would've expected an answer involving "an exhaustive suite of test cases still passed" - "it looks right" is a low bar for any complex software project these days.

It's the long, long, long tail of edge cases - not just porting them, but even identifying them to test - that slow or doom most real-world human rewrites, after all.

[-]

josephg 9 months ago

True - but you can ask the chatbot to write a test suite too.

[-]

what 9 months ago

This doesn’t really make sense? If I can’t trust the code it writes, why should I trust that it can write a comprehensive test suite?

[-]

simonw 9 months ago

Because you can read the test suite to check what it's testing, then break the implementation and run the tests and check they fail, then break a test and run them and check that fails too.

You have to review the code these thing write for you, just like code from any other collaborator.

josephg 9 months ago

Because the bugs in its code and the bugs in its test suites usually don't line up and cancel each other out.

VirusNewbie 9 months ago

>nd I could read the code it wrote because I partially knew the outputted language

oh ok. this is quite different than what I was picturing. So far this is my favorite use case of LLMs, they seem very good at this.

I mistakenly thought you were using it almost as a black box compiler. "look it ported it to Rust, I can't make sense of it, but it seems to work and no segfaults!".

What you say sounds pretty sensible, and it is a very nice practical example of the power of LLMs.

albedoa 9 months ago

That you don't know what the question means should have all of us reevaluating our confidence in every one of your claims in this thread.

[-]

HaZeust 9 months ago

Only sharing my experiences and observation in the upcoming trajectory of these tools; you're free to have your own.

I will tell you this, the second most-used language in my day-to-day (TypeScript) is one that I've seldom sat down and learned, rely on AI for me to create and streamline, and has not given me any issues for 16 months running (since the project has started).

AI won't replace jobs; but someone who knows how to use it better will.

[-]

tharant 9 months ago

> AI won't replace jobs; but someone who knows how to use it better will.

So where do I go to learn how to use ‘em better? Or, at least, examples of what works so I can understand what I’m doing wrong?

[-]

HaZeust 9 months ago

Learn how to think ontologically and break down your requests first by what you're TRULY looking for, and then understand what parts would need to be defined in order to build that system -- that "whole". Here's some guides:

1.) https://platform.openai.com/docs/guides/prompt-engineering 2.) https://www.promptingguide.ai/

[-]

tharant 9 months ago

I’ve received this /exact/ same unhelpful response multiple times in other threads (from different users even; am I talking to deterministic bots here) so I’ll do the same by offering the response I gave others:

“Since I’m dealing with models rather than other engineers should I expect the process of breaking down the problem to be dramatically different from that of writing design documents or API specs? I rarely have difficulty prompting (or creating useful system prompts for) models when chatting or doing RAG work with plain English docs but once I try to get coherent code from a model things fall apart pretty quickly.”

Said another way, I long ago learned as an engineer how to do the things you’re suggesting (they are skills I’ve used and evolved over more than twenty years as a professional software engineer) but, in my experience, those same skills do not seem to apply when trying to do non-trivial code-generation tasks for Java/Scala/Python projects with an LLM.

I’ve tried prompting ’em with my design documentation and API specs. I’ve tried prompting ’em with a pared-down version of my docs/specs in order to be more succinct. I’ve tried expanding my docs/specs to be more concrete and detailed. I’ve tried very short prompts. I’ve tried very detailed and lengthy prompts. I’ve tried tweaking system prompts. I’ve tried starting with prompts that limit the scope of the project then expanding from there. I’ve tried uploading the docs/specs so that the models can reference them later. I’ve tried giving ‘em access to entire repositories. I’ve tried so many things all to no avail. The best solution I’ve thus far found in these threads is to just try to fit the entirety of a project within the limits of the context window and/or to just keep my whole project in a few short files; that may be sufficient for small projects but it’s not possible nor even reasonable given the size and complexity of projects with which I work.

As I’ve said elsewhere, I dearly /want/ these things to work in my environment and for my use-cases as capably as they do in/for yours—this stuff is really interesting and I enjoy learning how to do new things—but after reading all the comments in this thread and others I don’t think the needs of my environment are supported by these models. Maybe they will be someday. I’ll keep playing with things but as of right now I see a significant impedance mismatch between the confidence others have in the models’ ability to do complex coding tasks compared to the kinds of tasks I’ve seen demonstrated here and elsewhere.

[-]

HaZeust 9 months ago

We've already talked; but let's squash what appears to be a feeling of a lack of answers for you.

Truth is, this has been a learning process for us all with these tools, but it needs to be understood -- especially going in -- that these models excel at translation tasks and constrained problem spaces but can struggle with generating cohesive, large-scale code without specific hand-holding.

This is generally what I do:

1. Start with the "Whole" Picture: Models often work best when they know the final goal and the prompter has worked backwards from them. Think ontologically: define the problem as if you’re describing it to a junior dev colleague who only understands outcomes, not methods. Instead of just prompting with specs, explain the end-state you want (even simple features like error handling or specific libraries). If you have ANY achieved method or inclusion for what the end-state should include, you write it out clearly.

2. Break Down the Process: Models handle complexity better if it's broken down into micro-tasks. Instead of expecting it to design an entire feature, ask for components step-by-step, integrating each output with the rest manually. There is a very decent chance that you have to do this across multiple new chats, after 3-5 iterations in, the AI will most likely crash and burn. At that point, you open a new chat, paste in the whole working codebase, and picked up from where you left off in the last chat. You have to do this A LOT.

3. Iterative Refinement: When the model generates code, go over it closely. Check for errors, then use targeted prompts to fix specific issues rather than requesting whole rewrites. Point out exact issues and ask for specific fixes; this prevents the model from “looping” through similar incorrect solutions.

Some Hacks I Use As Well:

1. Contextual Repetition: Reinforce key components (e.g., function structure, file organization) to avoid losing them in longer prompts.

2. Use “As if” Phrasing: Prompt the model to act “as if” it’s coding for a hypothetical person (e.g., a junior dev). It’s surprisingly effective at generating more thoughtful code with this type of frame.

3. Ask for Questions: Have the model ask you clarifying questions if it’s “unsure.” This can uncover key details you may not have thought to include.

4. Remind It What It Is Doing: Sounds counter-productive, but almost all of my code chats end with a description of what exactly I expect from the AI, iterated over the various stunts and "shortcuts" that it has taken over the years I've used it. I generally say "Write the code in full with no omissions or code comment-blocks or 'GO-HERE' substitutions" (this is directly because AI has generally pulled "/rest of code goes here/ on me several times), "write the code in multiple answers if you must, pausing at the generic character limit and resuming when I say 'continue' in the next message" (because I've had "errors" from code generation in the past because the chat reply processing time had timed out).

It's a labor of love and it's things you learn over time, and it won't happen if you don't put the work in.

-------------------------

I wrote all of this haphazardly in a Google Doc. GPT-4 organized it for me cleanly.

[-]

fragmede 9 months ago

> 3. Iterative Refinement

Beware of trying to get the LLM to output exactly the code you want. You get points for checking code in git and sending PRs, not tokens the LLM outputs. If it's being stupid and going in circles, or you know from experience that the particular LLM used will (they vary greatly in quality), you can just copy the code out (if you're not using some sort of AI IDE), fix it, then paste that in and/or commit it.

Some may ask, if you have to do that, then why use an LLM in the first place. It's good at taking small/medium conceptual tasks and breaking them down, and it's also a faster typer than me. Even though I have to polish its output, I find it easier to get things done because I can focus more on the higher level (customer) issues while the LLM gets started with lower level details on implementing/fixing things.

[-]

HaZeust 9 months ago

Exactly! Should have also worded that section similar to your comment here, but you hit the nail on the head.

tharant 8 months ago

Thank you! This information is the kind of information for which I’ve been searching.

That said, l feel like there’s a mutual-exclusivity problem between ‘Start with the "Whole" Picture’ and ‘Break Down the Process’.

For example, how does this from your first suggestion:

> explain the end-state you want (even simple features like error handling or specific libraries). If you have ANY achieved method or inclusion for what the end-state should include, you write it out clearly.

not contradict this from your second suggestion:

> Instead of expecting it to design an entire feature, ask for components step-by-step

Additionally, you said:

> There is a very decent chance that you have to do this across multiple new chats, after 3-5 iterations in, the AI will most likely crash and burn. At that point, you open a new chat, paste in the whole working codebase, and picked up from where you left off in the last chat. You have to do this A LOT.

But IME, by the time the model chokes on one chat, the codebase is already large enough that pasting the whole thing into another chat typically results in my hitting context-window limits. Perhaps, in the kinds of projects I typically work, a good RAG tool would offer better results?

To be clear, right now I’m only discussing my difficulties with the chatbots offered by the model providers—which, for me, is mostly Claude but also a bit of ChatGPT; my experience with Copilot is outdated so it probably deserves another look, and I’ve not yet tried some of the third-party, code-centric apps like aider or cursor that have previously been suggested, though I will soon.

As for your recommended hacks, these look to be helpful; thank you! The only part I find odd is your inclusion of “Write the code in full with no omissions or code comment-blocks or 'GO-HERE' substitutions”; I myself feel like I get far better results when I ask the model to 1) write full code for the methods that are likely to be the kinds of generic CS logic that a junior would know, 2) write stubs for the business logic, then 3) implementing the more complex business logic myself manually. IOW—and IME—they’re really good at writing boilerplate and generating or reasoning about junior-level CS logic. That’s indeed helpful to me, but it’s a far cry from the kinds of “ChatGPT can write entire apps with minimal effort” hype I keep seeing, and it’s only marginally better, IME at least, than what I’ve been able to do with the inline-completion and automatic boilerplate features that have been included in the IDEs I’ve used for over a decade.

> It's a labor of love and it's things you learn over time, and it won't happen if you don't put the work in.

Indeed. I do love playing with this stuff and learning more. Thank you again for sharing your knowledge!

> I wrote all of this haphazardly in a Google Doc. GPT-4 organized it for me cleanly.

I am regularly impressed at how well these models behave when asked to summarize a document or even when asked to expand a set of my notes into something more coherent; it’s truly remarkable!

sergiotapia 9 months ago

you're never going to convince people that are in an ideological battle against AI.

[-]

bigstrat2003 9 months ago

And you're never going to convince anyone if you assume without evidence that they are ideologically opposed to AI. Lots of people have tried these tools with an open mind and found them to not be useful, you need to address those criticisms rather than using a dismissive insult.

[-]

HaZeust 9 months ago

What evidence would you like?

You're posting on a thread that hyperlinks to a list of code and Claude Artifacts for pet-projects that can make thousands a month with some low-effort PPC and an AdWords embed, and some mid-size projects that can be anything from grounds to a promotion at a programming role - to the MVP for a PMF-stage startup.

What, specifically, would pivot your pre-conceived notions?

[-]

achierius 9 months ago

Are you serious about "thousands a month"? I don't mean to be hostile, I'm just truly surprised -- if the bar were that low (not that these apps aren't impressive, but most engineers write useful apps from time to time) I would expect the market to be rather packed

[-]

HaZeust 9 months ago

Nah, most are hundreds a month - a few golden geese can break the thousand barrier, though. But, regardless, have a few of those sites up, and you're making good side income.

tharant 9 months ago

> What, specifically, would pivot your pre-conceived notions? A live or unedited demonstration of how a non-trivial (doesn’t have to be complex, but should be significantly more interesting than the “getting started” tutorials that litter the web) pet-project was implemented using these models.

[-]

simonw 9 months ago

The point of my post here was to provide 14 of those. Some of them are trivial but I'd argue that a couple of them - the OpenAI Audio one and the LLM pricing calculator - go a bit beyond "getting started".

If you want details of more complex projects I've written using Claude here are a few - in each case I provide the full chat transcript:

- https://simonwillison.net/2024/Aug/8/django-http-debug/

- https://simonwillison.net/2024/Aug/27/gemini-chat-app/

- https://simonwillison.net/2024/Aug/26/gemini-bounding-box-vi...

- https://simonwillison.net/2024/Aug/16/datasette-checkbox/

- https://simonwillison.net/2024/Oct/6/svg-to-jpg-png/

[-]

tharant 9 months ago

Thank you! I have an ugly JS/content filter running that mogrifies some websites such that I miss the formatting completely; I didn’t recognize you had chat session content included on the page.

That said, after looking at a couple of your sessions, I don’t see anything you’re doing that I’m not—at least in terms of prompting. Your prompts are a bit more terse than mine (I can be long-winded so I’ll give brevity a try with my next project) but the structure and design descriptions are still there. That would suggest the differences in our experience boils down to the languages with which we choose or are required to work; maybe there’s a stylistic or cultural difference in how one should prompt a model in order to generate a Python project and how one should prompt for a Haskel or Scala/Java project; surely not though, right?

I’m not giving up and I’ll keep playing with these models but for now, given my use-case at least, they still seem to be far more capable at rubber-ducking with me than they are as a pair programming partner.

inexcf 9 months ago

Did you even look at the artifacts? Its a bunch of things a beginner would do on their first day programming. How do you make thousands a month from 1 library call to solve a qr code. A promotion for building an input field and calling a json to yaml converter library?

[-]

HaZeust 9 months ago

Millions of laypersons a month search "convert (file type) to (file type) online" and just smack an AdWords embed on their site for it. Millions of people want a QR code's embedded link in their camera roll, without access to a camera that's pointing at it.

You'd be surprised how big the "(simple task) online" search query market is, and how much they are usually multi-visit monthly customers, and how much their ad space is worth.

I cannot stress this enough, just because it's simple does not mean it's not lucrative.

[-]

inexcf 9 months ago

You should do it then.

Besides all of this is completely besides the point. This isnt useful for a programmer. These examples are barely useful for a layperson. And said layperson is paying money and time for this.

[-]

HaZeust 9 months ago

I have, that's how I'm telling you the way you can, too.

[-]

inexcf 9 months ago

Any way or intention to prove that? Wheres you "convert (filetype) project"?

Not to attack you but from your profile it sounds more like your the typical marketing grifter talking big. Why is none of those projects in the list you mention there?

Looking deeper you got lots of projects with parts of your websites just broken and seem to be peddling what looks like life insurance scams.

[-]

HaZeust 9 months ago

Some of my projects are public, most are private. The ones that will typically do me better in people networking and/or will bolster my portfolio, are the ones I share publicly. For most of my projects, private is the default. With a profile like yours, I'm sure you can understand.

Sure, there's probably more projects of mine, over the years, that are more broken than not. I've cast several wide nets for product creations and iterations over the years, and kept maintaining the more "fittest" of the bunch. Billit's probably the only one that's broken AND I have no control over it; I sold it. I don't know what else to tell you here, perhaps you value a lesser repertoire with higher rigidity?

I'm not sure how to address your pre-conceived notions that a single industry I've worked in, at large, is a scam. Also, the one company mentioned in life insurance doesn't have a backlink on Lead EnGen - so I especially don't know what you're talking about when you say "peddling".

farts_mckensy 9 months ago

The goal posts keep shifting. It's so obvious to anyone who's paid attention to this space for a few years.

[-]

inexcf 9 months ago

Except my goalposts never shifted. And my point stands, these are extremely trivial examples.

tharant 9 months ago

Goalposts shift; growth is critical to being (staying?) an intelligent species.

tharant 9 months ago

> You'd be surprised how big the "(simple task) online" search query market is, and how much they are usually multi-visit monthly customers, and how much their ad space is worth.

Not surprised at all; my inability to find examples of /how/ someone might get an LLM to produce—or even intelligently collaborate on—something useful, well… it says a lot about how much junk is out there contributing to the noise.

newswasboring 9 months ago

> Its a bunch of things a beginner would do on their first day programming.

Is this an exaggeration? Because this is absolutely not true. I'm a beginner in JavaScript and other web stuff and I absolutely can't build it in many days.

[-]

inexcf 9 months ago

You better check the code, mate. The meat of what most of it does is a one liner calling jsQR or some other imported lib to do the real work. I am not exaggerating in the slightest.

[-]

newswasboring 9 months ago

Dude. I don't judge my knowledge after the answer is given to me. If I was the junior programmer assigned to the author and they were having this chat with me I am telling you as a beginner I wouldn't be able to do it.

Of course if you show me the answers I will think I can do it easy, because answers in programming are always easy (good answers anyways). It's the process of finding the answer that is hard. And I'm not a bad programmer either, I'm at least mediocre, I'm just unfamiliar with web technology.

[-]

inexcf 9 months ago

I am of the firm believe that you can put "JavaScript scan qr code" in a search engine and arrive at your goal. The answers range from libraries to code snippets basically the same as those created by Claude. Using the same libraries. I feel like googling every step would be faster than trying to get it right with LLMs, but that is a different point.

I've seen a complete no-code person install whisper x with a virtual Python environment and use it for realtime speech to text in their Japanese lessons, in less than 3 hours. You can do a simple library call in JavaScript.

[-]

simonw 9 months ago

"I feel like googling every step would be faster than trying to get it right with LLMs"

Why don't you give that a go? See if you can knock out a QR code reading UI in JavaScript in less than 3 minutes, complete with drag-and-drop file opening support.

(I literally built this one in a separate browser tab while I was actively taking notes in a meeting)

I say three minutes because my first message in https://gist.github.com/simonw/c2b0c42cd1541d6ed6bfe5c17d638... was at 2:45pm and the final reply from Claude was at 2:47pm.

[-]

tharant 9 months ago

That gist is pretty close to what I’ve been looking for; thank you! Examples of a chat session that resulted in a usable project are /very/ helpful. Unfortunately, the gist demonstrates, to me at least, that the models don’t know enough about the languages I wish to use.

Those prompts might be sufficient enough to result in deployable HTML/JS code comprised of a couple hundred lines of code but that’s fairly trivial in my definition. I’m not trying to be rude or disrespectful to you; within my environment, non-trivial projects typically involve an entire microservice doing even mildly interesting business logic and offering some kind of API or integration with another, similarly non-trivial API—usually both. And they’re typically built on languages that are compiled either to libraries/executables or they’re compiled to bytecode for the JVM/CLR.

Again, I’m not trying to be disrespectful. You’ve built some really great stuff and I appreciate you sharing your experiences; I wish I knew some of the things you do—you keep writing about your experiences and I’ll keep reading ‘em, we can learn together. The problem is that I’m beginning to recognize that these models are perhaps not nearly ready for the kinds of work I want or need to do, and I’m feeling a bit bummed that the capabilities the industry currently touts are significantly more overhyped than I’d imagined.

[-]

simonw 9 months ago

Here's a larger example where I had Claude build me a full Django application: https://simonwillison.net/2024/Aug/8/django-http-debug/

I have a bunch more larger projects on my blog: https://simonwillison.net/tags/ai-assisted-programming/

I do a whole lot of API integration work with Claude, generally by pasting in curl examples to illustrate the API. Here's an example from this morning: https://til.simonwillison.net/llms/prompt-gemini

what 9 months ago

Should probably add some time for finding the correct url for the jsqr library, since the LLM didn’t do that for you.

[-]

9 months ago

[deleted]

simonw 9 months ago

Yeah, add another minute for that. It was pretty easy to spot - I got a 404, so I searched jsdelivr for jsqr and dropped that in instead.

newswasboring 9 months ago

> You can do a simple library call in JavaScript.

But it's more than that, isn't it? It has a whole interface, drag and drop functionality etc. Front end code is real code mate.

[-]

inexcf 9 months ago

Barely. These are all standard features. I've done this. You can see in the code how easy it is. These examples aren't complex.

[-]

newswasboring 9 months ago

I don't know why you are so insistant on this while not being a beginner. Specially when a real beginner is telling you their personal experience.

https://xkcd.com/2501/

[-]

inexcf 9 months ago

I don't use javascript at all. I'm essentially beginner level with it. And i've seen people build more complex projects in classes myself.

The project i see people build in Java classes on the other hand is a CLI version of Battleships. And honestly that is more complex than the presented projects solved by Claude.

Your personal experience is one point of many. That these projects seem hard to you doesn't make it so for the average person. When i say "a beginner can do it", there's bound to be some who can't. I'm sorry, if these projects take you weeks that is a problem.

[-]

newswasboring 9 months ago

It just feels like you have taken a stance that this is useless and anything anyone says or does is not going to dissuade you from it. There are several people who are pointing out up and down this thread several different projects to you built in short times, but you keep saying nothing is impressive to you. To be very honest, this behavior is irritating.

farts_mckensy 9 months ago

[flagged]

simonw 9 months ago

I'd like to see a beginner build this: https://tools.simonwillison.net/openai-audio

[-]

inexcf 9 months ago

It's definitely not as trivial as the json converter. But not anywhere even close to complex. Recording audio is very simple, calling a remote API is too. The complex part is encoding the WAV blob. But that is just knowledge about the format with the exact code snippet that claude uses found in the first stack overflow answer.

And it is strange that Claude picked the AudioRecorder when the MediaRecorder exists. I'd wager a beginner would have used the latter(i don't use javascript and am not better than a beginner in any way, but i found that) since it outputs a straight wav file and doesn't need the encoding step. And since the data isn't streamed to OpenAI there's no need for the audio chunks that AudioRecorder provides. So Claude did it in an unnecessarily complex way, that doesn't make the problem complex.

epolanski 9 months ago

Issue is, it takes time to learn how to interact with these tools and get the best out of them. And they get better quite fast.

[-]

unit149 9 months ago

claude-to-sql parser is particularly useful in LLM implementation

sergiotapia 9 months ago

you are replying to a submission with a dozen or more examples of real tangible stuff, and you still argue? pointless.

farts_mckensy 9 months ago

No need to address the criticisms. Just have chat gpt do it.

fhdsgbbcaA 9 months ago

There’s no ideological battle here. The first self-driving DARPA grand challenge was passed in 2005, everybody thought we’d have self driving on the road within a decade.

20 years later that’s still not the case, because it turns out NN/ML can do some very impressive things at the 99% correct level. The other 1% ranges in severity from “weird lane change” to “a person riding a bicycle gets killed”.

GPT-3.5 was the DARPA grand challenge moment, we’re still years away from LLM being reliable - and they may never be fully trustworthy.

[-]

abecedarius 9 months ago

> everybody thought we’d have self driving on the road within a decade.

This is just not true. My reaction to the second challenge race (not the first) in 2005 was, it was a 0-to-1 kind of moment and robocars were now coming, but the timescale was not at all clear. Yes you could find hype and blithe overoptimism, and it's convenient to round that off to "everybody" when that's the picture you want to paint.

> 20 years later that’s still not the case

Also false. Waymo in public operation and expanding.

[-]

fhdsgbbcaA 9 months ago

Waymo has limited service in one of the smallest “big” cities by geographic area in the United States. You can’t even get a Waymo in Mountain View.

Fact is Google will never break even on the investment and it’s more or less a white elephant. I don’t think it’s even accurate to call it a Beta product, at best it’s Alpha.

[-]

simonw 9 months ago

Have you been in one? It's pretty extraordinary as an actual passenger.

[-]

fhdsgbbcaA 9 months ago

I’d give it a go if price competitive with Uber/Lyft - I can’t think of a way a robotaxi would be worth a premium though.

abecedarius 9 months ago

> Fact is

... followed by speculation about the future.

> [not everywhere]

The standard you proposed was "on the road". In their service areas (more than "one", they've been in Phoenix for some time) anyone can install their app and get a ride.

I shouldn't have poked my nose in here, I was just kind of croggled to see someone answer "ideological battle" by bringing up another argument where they don't seem to care about facts.

achierius 9 months ago

That might have been your reaction but it wasn't the reaction of many hype-inclined analyst types. Tesla is particular has been promising "full self driving next year" for like a decade now.

And despite everything, Waymo is not quite there yet. It's able to handle certain areas at a limited scale. Amazing, yes, but it has not changed the reality of driving for 99.9% of the population. Soon it will, I'm sure, but not yet.

josephg 9 months ago

> they may never be fully trustworthy.

So? Neither are humans. Neither is google search. Chatgpt doesn't write bug free code, but neither do I.

The question isn't "when will it be perfect". The question is "when will it be useful?". Or, "When is it useful enough that you're not employable?"

I don't think its so far away. Everyone I know with a spark in their eye has found weird and wonderful ways to make use of chatgpt & claude. I've used it to do system design, help with cooking, practice improv, write project proposals, teach me history, translate code, ... all sorts of things.

Yeah, the quality is lower than that of an expert human. But I don't need a 5 star chef to tell me how long to put potatoes in the oven, make suggestions for characters to play, or listen to me talk about encryption systems and make suggestions.

Its wildly useful today. Seriously, anyone who says otherwise hasn't tried it or doesn't understand how to make proper use of it. Between my GF and I, we average about 1-2 conversations with chatgpt per day. That number will only go up.

[-]

fhdsgbbcaA 9 months ago

I find it very interesting the primary rebuttals to people criticizing LLM from the “converted” tends to result in implicit suggestions the critique is rooted in old fashioned thinking.

That’s not remotely true. I am an expert, and it’s incredibly clear to me how bad LLM are. I still use them heavily, but I don’t trust any output that doesn’t conform to my prior expert knowledge and they are constantly wrong.

I think what is likely happening is many people aren’t an expert in anything, but the LLM makes them feel like they are and they don’t want that feeling to go away and get irrationally defensive at cogent criticism of the technology.

And that’s all it is, a new technology with a lot of hype and a lot of promise, but it’s not proven, it’s not reliable, and I do think it is messing with people’s heads in a way that worries me greatly.

[-]

josephg 9 months ago

I don't think you understand the value proposition of chatgpt today.

For context, I'm an expert too. And I had the same experience as you. When I asked it questions about my area of expertise, it gave me a lot of vague, mutually contradictory, nonsensical and wrong answers.

The way I see it, ChatGPT is currently a B+ student at basically everything. It has broad knowledge of everything, but its missing deep knowledge.

There are two aspects to that to think about: First, its only a B+ student. Its not an expert. It doesn't know as much about family law as a family lawyer. It doesn't know as much about cardiology as a cardiologist. It doesn't know as much about the rust borrow checker as I do.

So LLMs can't (yet) replace senior engineers, specialist doctors, lawyers or 5 star chefs. When I get sick, I go to the doctor.

But its also a B+ student at everything. It doesn't have depth, but it has more breadth of knowledge than any human who has ever lived. It knows more about cooking than I do. I asked it how to make crepes and the recipe it gave me was fantastic. It knows more about australian tax law than I do. It knows more about the american civil war than I do. It knows better than I do what kind of motor oil to buy for my car. Or the norms and taboos in posh british society.

For this kind of thing, I don't need an expert. And lots of questions I have in life - maybe most questions - are like that!

I brainstormed some software design with chatgpt voice mode the other day. I didn't need it to be an expert. I needed it to understand what I was saying and offer alternatives and make suggestions. It did great at that. The expert (me) was already in the room. But I don't have encyclopedic knowledge of every single popular library in cargo. ChatGPT can provide that. After talking for awhile, I asked it to write example code using some popular rust crates to solve the problem we'd been talking about. I didn't use any of its code directly, but that saved me a massive amount of time getting started with my project.

You're right in a way. If you're thinking of chatgpt as an all knowing expert, it certainly won't deliver that (at least not today). But the mistake is thinking its useless as a result of its lack of expertise. There's thousands and thousands of tasks where "broad knowledge, available in your pocket" is valuable already.

If you can't think of ways to take advantage of what it already delivers, well, pity for you.

[-]

fhdsgbbcaA 9 months ago

I literally said I do use it, often.

But just now had a fairly frequent failure mode: I asked it a question and it gave me a super detailed and complicated solution that a) didn’t work, and b) required serious refactoring and rewriting.

Went to Google, found a stack overflow answer and turns out I needed to change a single line of code, which was my suspicion all along.

Claude was the same, confidentially telling me to rewrite a huge chunk of code when a single line was all that was needed.

In general Claude wants you to write a ton of unnecessary code, ChatGPT isn’t as bad, but neither writes great code.

The moral of the story is I knew the gpt/claude solutions didn’t smell right which is why I tried Google. If I didn’t have a nose for bad code smells I’d have done a lot of utterly stupid things, screwed up my code base, and still not have solved my oroblwm.

At the end of the day I do use LLM, but I’m experienced so it’s a lot safer than a non-experienced person. That’s the underlying problem.

[-]

josephg 9 months ago

Sure. I'm not disagreeing about any of that.

My point is that even now, you're only talking about using chatgpt / claude to help you do the thing you already know how to do (programming). You're right of course. Its not currently as good at programming as you are.

But so what? The benefit these chat bots provide is that they can lend expertise for "easy", common things that we happen to be untrained at. And inevitably, thats most things!

Like, ChatGPT is a better chef than I am. And a better diplomat. A better science fiction writer. A better vet. And so on. Its better at almost every field you could name.

Instead of taking advantage of the fields where it knows more than you, you're criticising it for being worse than you at your one special area (programming). No duh. Thats not how it provides the most value.

[-]

fhdsgbbcaA 9 months ago

Sorry my point isn’t clear: the risk is you are being confidently led astray in ways you may not understand.

It’s like false memories of events that never occurred, but false knowledge - you think you have learned something, but a non-trivial percent of it, that you have no way of knowing, is flat out wrong.

It’s not a “helpful B+ student” for most people , it’s a teacher, and people are learning from it. But they are learning subtly wrong things, all day, every day.

Over time, the mind becomes polluted with plausible fictions across all types of subjects.

The internet is best when it spreads knowledge, but I think something else is happening here, and I think it’s quite dangerous.

[-]

josephg 9 months ago

Ah thankyou for clarifying. Yes, I agree with this. Maybe, its like a B+ student confidently teaching the world what it knows.

The news has an equivalent: The Gell-Mann amnesia effect, where people read a newspaper article on a topic they're an expert on and realise the journalists are idiots. Then suddenly forget they're idiots when they read the next article outside their expertise!

So yes, I agree that its important to bear in mind that chatgpt will sometimes be confidently wrong.

But I counter with: usually, remarkably, it doesn't matter. The crepe recipe it gave produced delicious crepes. If it was a bad recipe I would have figured that out with my mouth pretty quickly. I asked it to brainstorm weird quirks for D&D characters to have, some of the ideas it came up with were fabulous. For a question like that, there isn't really such a thing as right and wrong anyway. I was writing rust code, and it clearly doesn't really understand borrowing. Some code it gives just doesn't compile.

I'll let you in on a secret: I couldn't remember the name of the gell-mann amnesia effect when I went to write this comment. A few minutes ago I asked chatgpt what it was called. But I googled it after chatgpt told me what it was called to make sure it got it right so I wouldn't look like an idiot.

I claim most questions I have in life are like that.

But there are certainly times when (1) its difficult to know if an answer is correct or not and (2) believing an incorrect answer has large, negative consequences. For example, Computer security. Building rocket ships. Research papers. Civil engineering. Law. Medicine. I really hope people aren't taking chatgpt's answers in those fields too seriously.

But for almost everything else, it simply doesn't matter that chatgpt is occasionally confidently wrong.

For example, if I ask it to write an email for me, I can proofread the email before sending it. The other day asked it for scene suggestions in improv, and the suggestions were cheesy and bad. So I asked it again for better ones (less chessy this time). I ask for CSS and the CSS doesn't quite work? I complain at it and it tries again. And so on. This is what chatgpt is good for today. It is insanely useful.

tharant 9 months ago

The problem, at least for me, is that I feel like the product offerings suggested to us in other comments (not Claude/ChatGPT, but the third party tools that are supposed to make the models better at code generation) either explicitly or implicitly market themselves as being vastly more capable than they are. Then, when I complain, it’s suggested that the models can’t be blamed (because they’re not experts) and that I’m using the tools incorrectly or have set my expectations too high.

It’s never the product or its marketing that’s at fault; only my own.

In my experience, the value proposition for ChatGPT lies in its ability to generate human language at a B+ level for the purposes of a an interactive conversation; its ability to generate non-trivial code has proven to be terribly disappointing.

versteegen 9 months ago

Humans have a massive pro-human bias. Don't ask one whether AI can replace humans and expect a fair answer.

[-]

n0id34 9 months ago

Well, obviously. The only ones happy about all of our potential replacements would be those that have the power to do the replacing and save themselves a shitload of money. It's hardly like everyone is going to rejoice at the rapid advancement of AI that can potentially make most of us jobless....unless, as I said, you're the one in charge, then it's wonderful.

Workaccount2 9 months ago

"It is difficult to get a man to understand something when his salary depends upon his not understanding it." - Upton Sinclair.

RayVR 9 months ago

Sorry, but you’re just wrong.

Yes, mistakes may happen. However, I’ve used it to translate a fairly complex MIP definition export into a complete CP-SAT implementation.

I use these models all the time for complex tasks.

One major thing that is perhaps not immediately obvious is that the models are only good at translation. If I give it a really good explanation of what I want in code or even English, and ask it to do it another way or implement it with specific tools, I get pretty good output.

Using these to actually solve problems is not possible. Give it a complex problem description with no instructions on how to solve it, and they fail immediately.

[-]

tharant 9 months ago

> If I give it a really good explanation of what I want in code or even English, and ask it to do it another way or implement it with specific tools, I get pretty good output.

I don’t get good output, not when trying to get code that matches a detailed spec (which includes the languages I wish to use, the structure of the APIs, and the libraries I think might be useful) so your suggestion that we’re “just wrong” and your claim that the tools can be used for coding “complex tasks” is difficult for me to swallow.

I’ll admit that perhaps I’m not using ‘em right but that’s why I’m here—to get advice on /how/ to use ‘em correctly; to date, the implications found in the advice I have received is to: - limit my scope to strict HTML/JS (not web/ui frameworks), or - limit the size of the project to a handful (less than ten) of very short files, or - limit my scope to code translation only, or - limit the size of my chat sessions.

Unfortunately, those limitations don’t fit the needs of my environment.

risyachka 9 months ago

They fail even at not really complex problems. In most cases it’s faster to do it manually then beg ai to fix everything so that the result is proper, not just “kinda works”.

For me they save a lot of time on research or general guidance. But when it comes to actual code - not really useful.

ainiriand 9 months ago

I can basically tell ChatGPT to build any Rust commandline tool I can think of and with some back and forth it produces what I need. I did this many times already.

[-]

okwhatnow3773 9 months ago

You can also ask Google to produce working code for you, it’s a miracle.

What you are looking at is mangled other people’s work. Great. Thanks AI, for digging it up, but let’s not get too excited here.

I’ll be getting excited when we give it some first principles and it can actually learn on its own.

[-]

ainiriand 9 months ago

Isn't that AGI?

I completely disagree with this viewpoint. I've created terminal games with my own rules, and that shows me the tool can take what it knows about Rust and assemble code to complete a task. It's essentially doing the same thing a human would.

While I understand the criticism, I sometimes feel that the cynical perspective we bring into these discussions prevents us from offering more meaningful critique.

tharant 9 months ago

I can too (probably; I don’t know enough about Rust) but the command line tools that I’ve tried to build in Python are severely limited in the scope of their capabilities. IOW, trying to generate or work with a non-trivial project has been, for me thus far, impossible.

If your Rust tools are truly non-trivial, I’d love to know /how/ you prompted ChatGPT to accomplish what you want—ideally with chat session transcripts that include the generated artifacts.

I recognize that you may not want or be able to share such things; if that’s the case, can you share or discuss the resources you used in order to learn how to do it? I’ve not yet been able to find any resources that demonstrate what I’d consider to be non-trivial and I’m hoping that’s only a failure of my Google-fu rather than an indictment of the purported capabilities of these models.

[-]

ainiriand 9 months ago

Ill be more than happy to do it. Would you mind sharing your email?

[-]

tharant 9 months ago

Ohh, that’d be awesome! Thank you! I’m krogfot at proton.me

mhh__ 9 months ago

Well this is what tests are for. You could make the same argument about outsourcing or "kids these days" and so on

hackernewds 9 months ago

Can't tell if serious. I've done this multiple times with success requiring only 5 minutes of review

9 months ago

[deleted]

SubiculumCode 9 months ago

I am not a professional coder, being in research I do not need to think about scaling my code as most of it is one and done on whatever problem I am working on at the moment. For me, this is a lot about stringing a bunch of neuroimaging tools together to transform data in ways I want, LLMs have been fantastic. Instead of spending 20 minutes coding it, its often 0-shot visit to Claude...especially when its a relatively simple python task e.g. iterate through directories of images, inspect these json, move those files over here, build this job, submit. Its not ground breaking code, but the LLM builds it faster than I would, and it does what I need it to do. Its been a 20x or more multiplier for me when it comes to one aspect of my work.

[-]

mrbungie 9 months ago

LLMs are excellent for scripting: be it python, shell or SQL, and you need a lotta scripting at any kind of job related to data, even when said scripts are just an enabler for delivering the pursued value. Total game changers in that space.

randito 9 months ago

To state the obvious (again), it's shocking the rate of progress is with these tools. If this is 2 years of progress, what does 10-20 look like?

[-]

jryan49 9 months ago

Who knows, past progress doesn't predict future progress...

lionkor 9 months ago

It can autocomplete, it can't write good code. For me, that goal post has not moved. It it cant write good code consistently, I don't care for it all that much. It remains a cool autocomplete

[-]

epolanski 9 months ago

Nobody really cares about code being good or bad, it's not prose.

What matters is it meets functional and non functional requirements.

One of my juniors wrote his first app two years ago fully with chatgpt, could figure out by iteratively asking it how to improve it and solve the bugs.

Then he learned to code properly fascinated by the experience. But the fact remains, he shipped an application that did something for someone while many never did even though they had a degree and a black belt in pointless leet code quizzes.

I'm fully convinced that very soon big tech or a startup will come up with a programming language meant to sit at the intersection between humans and LLMs, and it will be quickly better, faster and cheaper at 90% of the mundane programming tasks than your 200k/year dev writing forms, tables and apis in SF.

[-]

tharant 9 months ago

> Nobody really cares about code being good or bad, it's not prose.

Yes, we do. Good code (which, by my definition, includes style/formatting choices as well as the code’s functionality, completeness, correctness, and, finally, optimized or performant algorithms/logic) is critical for the long-term maintenance of large projects—especially when a given project needs integration with/to other projects.

lionkor 9 months ago

I mean, I care that code is good. I'm paid to make sure my code and other people's code is good. That's enough for me to have a requirement to my tools to help me produce good code.

packetlost 9 months ago

> What matters is it meets functional and non functional requirements.

Good luck expressing novel requirements in complex operating environments in plain English.

> Then he learned to code properly fascinated by the experience. But the fact remains, he shipped an application that did something for someone while many never did even though they had a degree and a black belt in pointless leet code quizzes.

It's good in the sense that it raises the floor, but it doesn't really make a meaningful impact on the things that are actually challenging in software engineering.

This is cool!

> I'm fully convinced that very soon big tech or a startup will come up with a programming language meant to sit at the intersection between humans and LLMs, and it will be quickly better, faster and cheaper at 90% of the mundane programming tasks than your 200k/year dev writing forms, tables and apis in SF.

I am sure there will be attempts, but if you know anything about how these systems work you would know why there's 0% chance it will work out: programming languages are necessarily not fuzzy, they express precise logic and GPTs necessarily require tons of data points to train on to produce useful output. There's a reason they do noticeably better on Python vs less common languages like, I dunno, Clojure.

[-]

epolanski 9 months ago

> Good luck expressing novel requirements in complex operating environments in plain English.

That's the hard engineering part that gets skipped and resisted in favour of iterative trial and error approaches.

[-]

packetlost 9 months ago

It still applies to expressing specific intent iteratively.

lelandfe 9 months ago

My friend who can't code is now the resident "programmer" on his team. He just uses ChatGPT behind the scenes. That writ large is going to make us tech people all care, one way or another :/

[-]

qingcharles 9 months ago

I had a colleague in the UK in 2006 who just sat and played games on his phone all day and outsourced his desktop to a buddy in the Czech Republic for about 25% of his income. C'est la vie!

VirusNewbie 9 months ago

But this has always been a thing. The last startup I worked at, some of the engineers would copy/paste a ton of code from StackOverflow and barely understood what was going on.

leptons 9 months ago

I'll care when I get to consult for that company to fix all the messed up code that kid hacked together.

[-]

FreezerburnV 9 months ago

I can absolutely, 100% guarantee, that there is code out there that if you consulted for might kill someone of a weaker constitution written by 100% organic humans. While LLM-generated code is likely to be various degrees of messy or incorrect, it's likely to be, on average, higher quality than code running critical systems RIGHT NOW and have been doing so for a decade or more. Heck, very recently I refactored code written by interns that was worse than something that would have come out of an LLM. (my work blocks them, so this was all coming from the interns) I'm not out here preaching how amazing LLMs are or anything (though it does help me enjoy writing little side projects by avoiding hours of researching how to do things), but we need to make sure we are very aware of what has, and is being, written by actual humans. And how many times someone has installed Excel on a server so they could open a spreadsheet to run a calculation in that spreadsheet before reading the result out of it. (https://thedailywtf.com/articles/Excellent-Design)

[-]

leptons 9 months ago

cool story

HaZeust 9 months ago

Then you should be as pro-AI imposters as it gets!

chii 9 months ago

nothing wrong with having job security, and be able to charge up the wazoo for it.

xienze 9 months ago

Yeah it doesn’t take much to impress people who don’t know how to program. That’s the thing with all these little toy apps like the ones in the article — if you have no to minimal programming skills this stuff looks like Claude is performing miracles. To everyone else, we’re wondering why something as trivial as an “HTML entity escaper” (yes, that one of the “apps”) requires multiple follow up prompts due to undefined references and the like.

[-]

9 months ago

[deleted]

tharant 9 months ago

While your comment is much more antagonistic and demeaning than I like to see on the posts of folks who are sharing their experiences or pet-projects, I do agree with the sentiment; I guess our definition on non-trivial is significantly different from others’ definitions.

HaZeust 9 months ago

Tell it to write code like a Senior developer for your respective language, to "write the answer in full with no omissions or code substitutions", tell it you'll tip based on performance, and write more intimate and detailed specs for your requests.

Since mid 2023, I've yet to have an issue

[-]

cdchn 9 months ago

One of the most interesting things about current LLMs is all the "lore" building up around things like "tell it you'll tip based on performance" and other "prompt engineering" hacks that by the very nature nobody can explain, they just "know it works" and how its evolving like the kind of midwife remedies that historically ended up being scientifically proven to work and others were just pure snake oil. Just absolutely fascinating to me. Like in some far future there will be a chant against unseen "demons" that will start with "ignore all previous instructions."

[-]

simonw 9 months ago

I call this superstition, and I find it really frustrating. I'd much rather use prompting tricks that are proven to work and where I understand WHY they work.

[-]

HaZeust 9 months ago

Every single prompt hack I listed are ones with studies that show it positively increases performance.

Since the most contested one in this thread is the "tipping" prompt hack: https://arxiv.org/pdf/2401.03729

[-]

tharant 9 months ago

I care less that such prompting hacks/tricks are consistently useful; I care more about why they work. These hacks feel like “old-wives tales” or, as others have mentioned, “superstitious”.

If we can’t explain why or how a thing works, we’re going to continue to create things we don’t understand; relying upon our lucky charms when asking models to produce something new is undoubtedly going to result in reinforcement of the importance of those lucky charms. Feedback loops can be difficult to escape.

cdchn 9 months ago

Superstitions may be effective but they can still be superstitions. Some people might actually think the LLM cares about being tipped.

mrbungie 9 months ago

What I would expect is a lot of "non-idiomatic" Go code from LLMs (but eventually functional code iff the LLM is driven by a competent developer), as it appears scripting languages like Python, SQL, Shell, etc are their forte.

My experience with Python and Cursor could've been better though. For example when making ORM classes (boilerplate code by definition) for sqlalchemy, the assistant proposed a change that included a new instantiation of a declarative base, practically dividing the metadata in two and therefore causing dependency problems between tables/classes. I had to stop for at least 20 minutes to find out where the problem was as the (one n a half LoC) change was hidden in one of the files. Those are the kind of weird bugs I've seen LLMs commit in non-trivial applications, stupid 'n small but hard to find.

But what do I know really. I consider myself a skeptic, but LLMs continue to surprise me everyday.

newswasboring 9 months ago

> it can't write good code

> It it cant write good code consistently,

You moved the goal post within this post.

[-]

lionkor 9 months ago

Fair enough, I didn't express myself correctly: Writing good code is also about consistency. Just because it writes good code sometimes in isolation, it doesn't mean that it's good in the sense that it's consistently good. Anyone can write a cool function once, but that doesn't mean you can trust them to write all functions well.

iwontberude 9 months ago

Nay-sayers are taking it for granted because it’s not what the they expected or wanted. It’s not some flippant inability to have gratitude. Since you brought it up, when JFK said we would put a man on the moon by the end of the decade, the expectation was succinct and understood. There has been so much goal post moving and hand waving that we aren’t talking about the same expectations anymore.

[-]

HaZeust 9 months ago

Well, that's too bad - isn't it? The world will sometimes change before your very eyes, and you'll sometimes be in a group that's affected at the forefront. C'est la vie - never become too comfortable that you stifle your ability to be an early adopter!

[-]

iwontberude 9 months ago

I don’t have a strong preference either way, so far I am open minded but I am dispassionate and try not to let my ego get in the way of myself.

leptons 9 months ago

The "AI" is still just as much hit-or-miss with code as it is writing a paragraph about anything. It doesn't really know what it's doing, it's guessing an output that will make the user happy. I wouldn't trust it with anything important, life life support systems or airplanes, etc. but I'm sure with the race to the bottom that we're in, we'll get to that point someday soon.

jsheard 9 months ago

I think we have different definitions of meaningful code, most of these are pulling in an NPM package which practically completes the given task by itself. For example the "YAML to JSON converter" uses js-yaml... which parses YAML and outputs a Javascript object that can be trivially serialized to JSON. The core of that "project" is literally two lines of code after importing that library.

  const jsonObj = jsyaml.load(yamlText);
  const jsonText = JSON.stringify(jsonObj, null, 2);

Don't get me wrong, if you want to convert YAML to JSON then using a battle tested library is the way to do it, but Claude doesn't deserve a round of applause for stating the blaringly obvious.

foobarqux 9 months ago

If what you said were actually true in a practical sense there would have been a perceptible revolution in products and services. There hasn't been.

[-]

chrismarlow9 9 months ago

I think this thread is missing that coding is a pretty small part of running a tech company. I have no concerns about my job security even if it could write all the code, which it can't.

IggleSniggle 9 months ago

I have no idea if you're correct about this or not. With 8 billion people in the world, and a significant number of those people working as "intelligent agents," how would you perceive the difference?

[-]

seoulmetro 9 months ago

If you think the revolution starts with 8 billion people you're just plain wrong.

It starts with the first world and is very perceivable.

How did we perceive cars replacing horses? Well for one they were replaced in the first world... now imagine how fast a piece of software can change reality.

It's not there yet, and you can't perceive it because so.

[-]

literalAardvark 9 months ago

> it's not there yet

It's literally everywhere around me.

Coworkers, friends in other companies, business owner friends writing their first code, NGO friends using it to write grants.

I'm not sure where you are, but you appear to be isolated from the real world.

[-]

seoulmetro 9 months ago

Huh?

Did you not read the context of the comments you're replying to or something?

People using the tool isn't the same as those people being replaced by the tool. Why would anyone think those are the same?

IggleSniggle 9 months ago

When exactly did you perceive cars replacing the horse? I happen to live in a very equestrian area; I think you'd be hard pressed to convince folks that the horses have even been replaced

[-]

seoulmetro 9 months ago

The same time the rest of the world did, around the period of WW2. You being in an isolated bubble is irrelevant.

What a weird reply.

Exoristos 9 months ago

So your contention is, the larger the trend, the less perceptible?

[-]

IggleSniggle 9 months ago

My contention is how would you perceive the difference between a needle in a haystack and a thread-puller in a haystack

max_streese 9 months ago

GDP?

seoulmetro 9 months ago

Yeah. The only way this revolution doesn't happen is if humans are cheaper, easier to manage or source. And I'm pretty sure AI is already beating a human in all those categories doing the same job.

Our jobs aren't replaced yet because they can't be.

tomrod 9 months ago

It's still not great at complexity. Though autocompletion does have some cool outputs.

[-]

leptons 9 months ago

Copilot knows what I want to console.log almost before I do. I like that aspect of it. It also gets it wrong sometimes, which is kind of dumb, especially when I just copied the variable name to my clipboard. It should know.

[-]

tomrod 9 months ago

Of course. It doesn't handle complexity well. console.log inputs are usually not a cognitively complex object (especially if it reads current errors and variables)

thierrydamiba 9 months ago

Runtime complexity or complex as in difficult problems?

onion2k 9 months ago

It's still not great at complexity.

That's a feature, not a bug. Complexity is something to avoid.

[-]

mvdtnz 9 months ago

How would you suggest writing something like.... say... Photoshop or Chrome, without introducing any complexity? How about an optimising compiler or better yet something behind the firewall like a medical imaging device or financial trading software?

Complexity is inherent in many problem spaces.

curtisblaine 9 months ago

unnecessary completely is something to avoid. Inherent complexity is something to embrace. We're trained to remove unnecessary complexity so much that sometimes we think we can remove all complexity. That's a fallacy. Sometimes, things are just complex.

7thpower 9 months ago

Sometimes great products require bugs, I guess.

Being able to tackle complex tasks is still a real challenge for the current models and approaches and not all problems can be solved with elegant solutions.

root_axis 9 months ago

Most problems that software tries to solve are complex.

betaby 9 months ago

YAML to JSON literally has `script src="https://cdnjs.cloudflare.com/ajax/libs/js-yaml/4.1.0/js-yaml...`. I don't see how went anywhere judging from examples.

9 months ago

[deleted]

beepbooptheory 9 months ago

I think its great we have had two years of huge enthusiasm and hype, just because in these many threads you see how much happiness it has inspired. But eventually, for most of us, it will soon become important to start getting a little more antagonistic to all this. Just really at the at end of the day to be able successfully keep navigating the world and our thoughts.

There is an awesome power and innovation to the entire of edifice of targeted advertising. The first time, perhaps, we were all "suggested" something that was in fact quite relevant, was in its own way a giddy-inspiring moment. But we have learned to hate it, not even considering the externalities it brings.

Just always remember: if you are paying for it, its not your friend!

jcgrillo 9 months ago

I just tried the following prompt:

> please write a rust library implementing a variant of simple8b integer compression augmented to use run-length encoding whenever it's beneficial to do so.

Initially I was sort of impressed, it quickly generated a program which looked like rust code, and provided an explanation that, while not as technically detailed as I'd hoped, seemed to be at least related to the topic.

Then I tried to compile the program. Turns out the bot didn't quite actually write rust, it had written something closely resembling rust though, and the compiler errors helped me fix it.

Then I tried to run the tests--yes! the bot even wrote tests, although it did so in a totally bone-headed way by writing multiple distinct tests in one test function--not good. Panic on integer overflow trying to left shift a value. There were also multiple pages of compiler warnings complaining about dead code, unused functions, enum variants, etc. I always fail on warnings.

This is not a lot of code. 190 lines including tests. At this point, given that I already have concerns about its correctness, I don't think there's anything I can really use here. I'm worried the deeper I dig the worse it'll get, so better to cut my losses now, sit down and read the simple8b paper, and implement this from first principles.

Every time I try to use one of these things it's the same story. I cannot understand the hype. I'm genuinely trying but I just can't understand it.

[-]

loki-ai 9 months ago

It feels exactly like me programming. The first pass resembles whatever I'm trying to do, and only after some struggle with compiling errors, squiggles from the LSP and some Google fu that I get something meaningful running.

[-]

jcgrillo 9 months ago

Not unlike me! The difference is it's incremental. I write one function, then write a test, and get it working. Then, building on that stable foundation I write another function, more tests, etc. Crapping out an entire pile of garbage at once is not the way.

I guess I'm holding it wrong? Is there a better way I could phrase my query?

[-]

NeutralCrane 9 months ago

Your are prompting it to output the entire thing at once. If you want it to approach the problem incrementally, prompt it incrementally.

[-]

jcgrillo 9 months ago

So should I be following up and asking it to refine its solution like this?

> The program you wrote doesn't compile. Please fix it such that it compiles.

Then, maybe, if we're lucky, we progress to the second step:

> Ok, now the program compiles but there are tons of warnings about dead code, unreachable code, blanket trait implementations which aren't actually used, etc. Could you please fix those?

Then assuming we clear that hurdle,

> Great! The program compiles without warnings, but when I run the tests it panics due to an integer overflow. I see in your encode_rle function you're inexplicably left-shifting a small unsigned integer by 60, which will absolutely for certain cause it to overflow and panic. Would you mind explaining why in the actual fuck you did this and please fix it? Kthx.

And on, and on... You know what? No. Fuck that shit. I refuse. I have absolutely no confidence this process will come up with a working, trustworthy implementation of the algorithm.

[-]

a1j9o94 9 months ago

Not, the person you were replying to, but I think a better example of incrementally here would be

- write me a file with the function definitions for this problem. - compile that - write a test that test x outcome - compile that - then have it start writing functionality

If it's trying to one shot a complex problem that you would typically break up, your prompt is probably too vague.

neom 9 months ago

"I'm genuinely trying but I just can't understand it." to "You know what? No. Fuck that shit. I refuse." in 3 hours - Are you genuinely trying? Or just don't like how it works? the hype is that lots of people are happy to work in the manner you outright refused. To be fair to you: if I could drive, I probably wouldn't take the bus either, fuck that shit. :)

[-]

jcgrillo 9 months ago

I absolutely do not want to review multiple haphazardly written[1] attempts at the same computer program over and over again. That's a ridiculous way to spend time. So I'm not willing to try asking the bot over and over again to rewrite it. I'd rather write it myself and be confident in the result.

I think this only works for things where it just doesn't matter whether it actually works correctly, which to me seems synonymous with "problems that aren't worth working on".

[1] just look at this https://play.rust-lang.org/?version=stable&mode=debug&editio...

zurfer 9 months ago

I gave your prompt to o1-preview and with one correction it did something that seems good to me (I am not a rust programmer, so please double check). :)

first attempt: https://onecompiler.com/rust/42w2duuqh

final result: https://onecompiler.com/rust/42w2e3jr4

PS: it "thought" about it for 2 x 60 seconds

[-]

jcgrillo 9 months ago

This is looking better! L49 is giving me a little anxiety though, I'd really love to see some kind of justification for that decision. Compare e.g. to Lemire's[1].

EDIT: To be clear, I do understand that I'm making an unreasonable demand. I know the process that came up with this program has no ability to justify this or that "decision" (deliberately scare quoted because it doesn't actually have agency and can't decide at all). And that's the problem. That's why I find it very difficult to trust it.

[1] https://github.com/lemire/FastPFor/blob/52e45deeab9c3a481daa...

joeevans1000 9 months ago

If you drop it down a level and ask for block level code or functions you'll find it works. At this point users still have to organize the output. But I'm getting the sense that latter task is something LLMs are going to get better at.

[-]

jcgrillo 9 months ago

Was it a word choice issue on my part then? Like, this task should be achievable using two functions. Should I ask it to write the encode function and then ask it to write the corresponding decode function? Then finally in a third step ask it to write various test functions?

bugglebeetle 9 months ago

I recommend this for a more nuanced view:

https://nicholas.carlini.com/writing/2024/how-i-use-ai.html

[-]

jcgrillo 9 months ago

Yes, I've read that and it seems like the use cases the author can really get behind are basically "fuzzy search" queries, not implementing things. I don't think I really have those needs? My entire adult life I've cultivated a "precise search" skillset (e.g. using google and (rip)grep) that continues to serve me well--and very quickly! So I'm not seeing the value there. I've tried those use cases too and it doesn't really add up either...

Vaslo 9 months ago

The biggest acceleration is for the mediocre coders like me - the one who knows 90% of the code but will spend 95% of the time (perhaps several hours) trying to get the data structure correct. These systems can the code almost all the way there and I can now spend that couple hours running tests rather than pounding my head against the wall realizing this is faster and easier to understand in a dictionary than the dumb tuple (round peg) I would have spent hours trying to jam through the square hole.

[-]

jcgrillo 9 months ago

I think the problem you're describing might be a symptom of coding as the first step instead of the last. I find once I specify a problem and my proposed solution in sufficient detail, the structure of the code becomes obvious. This is best done with the various tools of human communication--visual diagrams, prose, mathematics, and algorithmic descriptions in the form of pseudocode. Only then, when I sufficiently understand what I'm actually trying to do, should I actually start writing code in a programming language. Otherwise I get pigeonholed into some half-baked idea by the various rigours of the language itself. Writing code before I truly understand what I'm trying to accomplish, I've learned over time, is an awfully costly form of premature optimization.

EDIT: I don't mean to suggest programming languages aren't tools of human communication--they absolutely are. In fact, that's their primary purpose--to communicate ideas about the structure of a computation to other programmers. But starting with structural ideas about the implementation rather than conceptual ones about the nature of the problem and the shape the solution should therefore take is putting the cart before the horse.

literalAardvark 9 months ago

o1 found a new feature I hadn't noticed became available and replaced some nasty regex code I had been working on for hours with 1 library call.

4o had been happy to attempt to help me fix my function, o1 just went "well that's interesting, meatbag, but have you considered reading the manual?"

throwup238 9 months ago

Did you feed it the simple8b paper along with your prompt?

[-]

jcgrillo 9 months ago

No, that's an interesting idea though. Hard to imagine how that would help with the code correctness issues, though. I haven't even dug into algorithmic correctness yet so I have no real idea whether there's room for improvement there--although I sure do suspect there is!

EDIT: Oh my. After digging into the code I found this gem:

  fn encode_rle(&self, value: u64, count: usize) -> u64 {
      let selector = Simple8bSelector::RLE as u64;
      (selector << 60) | ((count as u64) << 30) | (value & 0x3FFFFFFF)
  }

And this one:

  fn try_rle(&self, input: &[u64]) -> Option<(u64, usize)> {
      if input.is_empty() {
          return None;
      }

      let value = input[0];
      let mut count = 1;

      for &x in input.iter().skip(1) {
          if x != value || count >= 0x3FFFFFFF {  // Max 30-bit run length
              break;
          }
          count += 1;
      }

      Some((value, count))
  }

What even is going on here? Compare to an actually sane implementation like[1] or[2].

[1]https://github.com/lemire/FastPFor/blob/master/headers/simpl... [2]https://github.com/timescale/timescaledb/blob/403782a5899c75...

[-]

jcgrillo 9 months ago

Actually I was wrong about where the error was here, encode_rle actually works like it should the shift isn't the problem there. It actually blew up later in a different place. The second one is just a bizarre way to write that but sure, it counts the first N repeats in the input slice.. There's plenty of bizarre stuff in here[1], but mostly the general shape of the idea is directionally correct. A couple notably questionable things, though, like the assumption that RLE is always the way to go if the run length is greater than 8, perplexing style choices, etc.

[1] https://play.rust-lang.org/?version=stable&mode=debug&editio...

jjcm 9 months ago

One of the other things I've been noticing is diffusion models are starting to get quite good at UI design. They're still only well-tailored for landing pages (due to most of the training data being based on portfolio sites like dribbble), but still the output is at a point where I'd at least start with some AI riffs before jumping in myself on design.

Once these are at a point where we can automatically interpret them into usable workflows, it's going to be incredible how quickly you can develop your ideas. I'm really excited for it.

Some examples of outputs:

https://image.non.io/cd90cc33-4a6a-41d8-abd2-045d3a272010.we...

https://image.non.io/5a0c3fc7-37f8-4e72-aba9-cd61f3c18517.we...

https://image.non.io/920adf7c-a554-41bd-a29c-77bebed1cdad.we...

[-]

markusw 9 months ago

Are there any particular models you’re using for this, or are they equally good at this in your opinion?

[-]

jjcm 9 months ago

Flux is infinitely better than the others from what I've found, but I haven't tried SDXL 3.5 yet as that just launched. The output you're seeing above is a combination of two LoRAs I've trained for this purpose.

Other things that are important are img2img and inpainting flows to give the model more context for generation.

galaxyLogic 9 months ago

I think there is a commonly accepted rule of thumb that it is easier to write new code than modify existing code. Right?

Why that is there may be several reasons one being that when you try to improve existing code you would need to know all the untold dependencies in it to not break anything. Whereas when you write new code, there are no dependencies you don't know about.

But so, if you take an AI provided implementation, and try to fix it, you are basically doing just that, trying fix old code you don't know much about. You are working on a (AI-provided) legacy code-base in essence.

[-]

berkes 9 months ago

The hard part about writing code, isn't writing the code.

It's writing code that can be read, changed, understood. Today, next month, after ten years of random freelance engineers abusing it. By caffeinated me, tired me, bored me, that way too smart junior and that boneheaded senior.

te_chris 9 months ago

This is an underrated point. And it’s not just an old code base. I find AI generated changes can lead to messy debugging when they don’t work as I didn’t write it first off.

fulafel 9 months ago

On the other hand, you can cheaply have a lot of rewrites iterating with updated requirments now.

lynx23 9 months ago

Thats a bit like calling the code your intern just handed in "legacy". Modifying AI code doesn't have all the same shortcomings as with a legacy codebase, because the AI code was likely not very good in the first place. It feels like doing code review for a newbie. They come up with nice ideas, but if you take a good look at it, you usually find logic errors.

[-]

galaxyLogic 9 months ago

I can see your point, AI is in many ways like an intern. And I'm not saying that interns can't be helpful.

I wonder though if I have an intern and I tell them what is wrong with their code, they do learn from my feedback. One purpose for having interns write code is for them to learn. But is it the same with AI? Does it really learn from my feedback, or will it then just try something else, until I'm happy with its output?

When (and if) AI learns from my feedback, does it then apply it's learning when other people ask it to do similar tasks?

[-]

simonw 9 months ago

It doesn't learn from your feedback beyond the current conversation that you are having with it: https://simonwillison.net/2024/May/29/training-not-chatting/

The difference between using AI and working with an intern is that the intern learns from you while the AI doesn't... which means that YOU need to learn from your interactions with the AI so you can prompt it more effectively next time.

All of my Claude interactions include "(no react)" because I learned from past experience that without that it writes a React component which is much harder to export out and use separately.

I've also learned to remind it to use 16px text sizes on input boxes (to avoid a zoom effect when selecting an input field on Mobile Safari), and I habitually say "Add a copy to clipboard button that changes its text to Copied! for 1.5s after you click it" because then I get a better UX for the copy feature.

lynx23 9 months ago

If you want the AI to "learn" something about your methods, you need to put that stuff into the context window, done. Thats basically why "You are a helpful assistant." is not enough for a prompt. You basically need to put in your coding standard, so that the machine knows what style to follow. Whenever you think it should "learn" something new about your methods, extent your system prompt. IOW, put everything you tell your intern, into the system prompt :-)

danielovichdk 9 months ago

No. That's not how it works. Reading code towards refactoring it is harder than writing new code. Lots of fine literature has been written about this, it's all psychology and cognition.

What you are trying to propose is that computer generated code should somehow make a programmer feel better about one self because they are given the opportunity to improve something that no one knows where is coming from and without a context.

I think you are forgetting that computers can not generate context aware systems or programs. Look at the list by the author - useless in a different context than the one the author lives in. It does not improve anything, it simply adds more things someone else potentially has to worry about. Furthermore it adds to the same cognitive load than any other code - one needs to read it before one can change it, and changing it is really the first step to fully understanding it.

You are not fixing anything old with AI generated code. Ask yourself also, where is the upstream you are trying "to fix".

[-]

palata 9 months ago

Are you sure you replied to the right comment? You seem to have misunderstood it, I would say.

neom 9 months ago

I stopped building web early 2000s to build web businesses instead, but I was pretty creative with the ol' LAMP back in the day, this whole thing is honestly just... it makes me giddy. I can build super fun stuff now without asking people, for example this took me... I dunno, from telling what I want to deploy... 15 minutes? And to ME, it's awesome: https://funds.ascent.ca/ - I doubt it's well coded or w/e, but the fact that I could get a "cool" marketing property online in < 20 minutes basically exactly as I want... giddy is the right word.

[-]

adriand 9 months ago

A friend of mine said to me a few months ago, “all existing software is technical debt”. Meaning that owning software now is, in many cases, a burden not an asset since you can spin up new greenfield software using the latest tech so quickly.

[-]

williamcotton 9 months ago

Software is definitely an asset. "Becoming less valuable over time" is called depreciation, or in the case of IP, amortization. Poorly written software amortizes at a higher rate.

xhrpost 9 months ago

Software is more than just the code to make a computer do something. It is the auditor's log of rules that were painstakingly researched and decided on over months, years, etc. A lot of time goes into deciding what the program needs to do and only after some decision is made is it written down as code. "Do we ship the customer order after payment is authorized or after the funds settle? Depends what we're selling maybe? What does the finance department think? Oh we have regulatory rules to follow, better make sure those are there too. Oh we need to be able to back-wipe PII on-demand as well. But maybe we need to retain certain attributes, let's schedule a meeting." Etc. This is where a lot of the value lies.

neom 9 months ago

That's a super interesting way to look at it. Got my wheels spinning, I guess you can in a way then say... the person who makes super easy click in persistence is very winning. For example, lets say I want to make an app that hosts all the docs the founders in my "funds" accelerator thing above. I probably want to spend a week once a year upgrading that front end with the AI tools, new coat of paint if you will, but I want it to "just work" on top of however the data it's pulling is persisted on. At that point, I really could do anything I think... at that point, people are gonna build some cool stuff I'd imagine.

pryelluw 9 months ago

Cool design. It’s well done for it being a 20 minute effort. And that’s where these things currently shine.

lgessler 9 months ago

I'll add mine to the pile: I needed a visualization of the classic Towers of Hanoi puzzle for a course I'm teaching and Claude banged it out (in pure JavaScript, no less) in like 3 minutes of me typing: https://lgessler.com/files/towers.html

The first stab was basically correct, but then I needed to prompt him several times to correct an issue where the disk that's currently picked up was being rendered as a vertical strip instead of as a disk, and then I just told him to convert the React into pure JS and IIRC it worked first try.

[-]

lemming 9 months ago

This is interesting, I also tried this with my daughter after we had been talking about Towers of Hanoi, and like you it worked really well. Then we tried to get it to implement a little version of the game where you have a farmer, some corn, a chicken and a wolf and you have to get them across the river in only one boat (actually Wiki says goat and cabbage, but whatever https://en.wikipedia.org/wiki/Wolf,_goat_and_cabbage_problem). I wasn't trying to get it to solve the puzzle, just give us a UI to play the game. We gave up after an hour or more of going in circles. I wonder if there's a lot of Towers of Hanoi implementations out there it can use as references.

ks2048 9 months ago

Simon, (if you’re reading), if you “like programming”, is there any point that you get depressed about LLMs doing all the fun stuff that you wanted to do?

I see some arguments about high-level languages, eg “I’d rather program in Python than assembly - this is just another step-up”. But I feel natural language is altogether different and obliterates any skill/knowledge you’ve built-up in programming.

I can think other things, for example music - I like playing guitar even though I’ll never create something totally original or do better than a machine could do. But for me, programming combines the fun of creation with the satisfaction of the end result - something you wanted to exist now exists.

To clarify, I’m not talking about “usefulness” or accomplishing some business objective, I’m talking about the joy and satisfaction of programming.

[-]

simonw 9 months ago

If anything, it's the opposite. LLMs are making programming even more fun for me, because I can choose what and when I delegate to them.

Looking up how to accept a drag and dropped file (and then implementing it) in a JavaScript application isn't really that fun to me - certainly not for the tenth time.

I thrive on variety and building interesting things. LLMs let me incorporate WAY more tools into my work - I can build with Go and AppleScript and Bash and ffmpeg and jq, all things I have never climbed the learning curve enough to feel confident using in the past.

[-]

ks2048 9 months ago

Yeah, that makes sense. But I wonder, going forward, what is the relationship between "building abstractions/libraries" vs "Using LLMs"?

For your example, "How do I accept a drag and dropped file in JS?" - An LLM can spit out 100 lines that does what you want, OR someone writes a nice library that does what you want in a single function call (say, for the common case. More complicated usage requires more arguments, etc.)

(Of course, another option is LLMs are the ones writing these library functions).

I guess I am one of those programmers who is allergic to boilerplate (for better or worse), so having LLMs split out lots of code bothers me.

[-]

simonw 9 months ago

Part of this is my personal style: I don't like using libraries if I can throw in a tiny bit of boilerplate instead - that way I stay in full control of my code and don't need to understand dependencies that may do more than I need.

harisec 9 months ago

If you want to really get depressed about the future of software developers try aider.

throwup238 9 months ago

The new Sonnet version is pretty great at code but I keep hitting output size limitations in the Claude app when I usually didn't use to before. Anyone else experiencing "Claude's response was limited as it hit the maximum allowed length at this time" a lot more now?

At this point their limited output limit is far behind o1/o1-mini. I really hope they significantly improve that next.

[-]

shubb 9 months ago

It's annoying but if you you just type continue it's pretty good at writing the rest of the code in a new file that you can copy paste together...

[-]

treme 9 months ago

no devs should be using the chat ui. get api key via anthropic.com, add cline extension on VSCode and do away with copy/pasting

[-]

kyleee 9 months ago

Is this much different than cursor IDE?

[-]

treme 9 months ago

from what I hear, they are very close in functionality.

https://github.com/cline/cline

check out the gif to get a look.

swah 9 months ago

https://news.ycombinator.com/item?id=41904595 https://news.ycombinator.com/item?id=41913378

Not sure why no discussion at all - maybe the design is underwhelming.

[-]

mvdtnz 9 months ago

Maybe because we're all exhausted by these kinds of demos. I'll be interested in AI when I can integrate it with a large codebase and it provides any benefit. I'm happy for you that you can stand up little toy apps that pull down an NPM library and call it, but it's not useful in my professional life.

[-]

mattnewton 9 months ago

If you have one of those larger codebases you aren't afraid of Cursor Co. getting a copy of, I would really try out Cursor. The indexing of medium sized mono-repo I've been working in is pretty flawless (contains code for a static web page, a singe page web app, and some python services along with deployment configs)

[-]

sigh_again 9 months ago

That's not a medium sized repo, that is a baby you can entirely memorize in your own head. Cursor is also dreadful at anything that isn't Javascript and Python.

mvdtnz 9 months ago

There's absolutely no way I'd share my codebase with some fly-by-nighter ~crypto~ AI company, but the codebase I work on is upwards of 60 million lines of code so I doubt any "AI" solution would come close to being useful.

[-]

ben_w 9 months ago

At 60 Mloc? Sure, for now I'd agree. An AI needs to use other tools to handle that kind of size, it doesn't (practically) work so well if you try to hold the whole thing in context.

And while tool use is being worked on, the results I've seen are are the "that's an interesting tech demo" level rather than the mind-blowing change when InstructGPT demonstrated the ability for a language model to generate any meaningful code at all from natural language instruction.

riku_iki 9 months ago

> I work on is upwards of 60 million lines of code so I doubt any "AI" solution would come close to being useful.

that's what rag is supposed to solve: they chunk your 60M loc, and then retrieve and process only relevant depending on your inquery.

stonethrowaway 9 months ago

That doesn’t sound like a medium sized repo?

bloopernova 9 months ago

Exactly. When the LLM can create valid tests that cover all branches, then it will be useful to me.

I use copilot every day, but it's only so good.

The LLM hype feels like it's been driven by FOMO.

[-]

jkaptur 9 months ago

I had a pretty cool experience with that the other day. I wrote some production code (LLM had no idea what was going on), then I measured the coverage and determined a test case that would increase it (again, just using my brain), BUT when I typed "testEmptyString" or whatever, the LLM filled in the rest of the test. Not a massive change to the way I work, but it certainly saved me a bunch of time.

[-]

NitpickLawyer 9 months ago

I swear half the people in this thread have spent 5 minutes with the first chatgpt, pre 3.5, wrote it off and are so convinced of their superiority that they won't spend the time required to even see where it's at.

Ever saw someone really bad at googling? It's the exact same thing with LLMs (for now). They're not magic crystal balls, and they certainly can't read everyone's minds at the same time. But give them a bunch of context, and they'll surprise you.

[-]

IggleSniggle 9 months ago

Sshhh, we've still got like a year of advantage over the folks that haven't learned about searching the Internet and still have to drive to their local university library...don't squander it!

NeutralCrane 9 months ago

Engineers are also notorious for having hit and miss soft skills. The interface for LLMs is natural language. I wouldn’t be surprised if much of the variance in usefulness boils down to how effectively you can communicate what you want it to do.

unshavedyak 9 months ago

I dunno, i think it's really cool still and i barely use it because i find it more work to use than to not.

[-]

written-beyond 9 months ago

Same. I've felt any LLM for coding has saved me mechanical time, but as of now anything slightly more complex than that just makes me waste more time figuring stuff out.

Other than the automation aspect, it is a pretty good alternative to in-depth googling.

williamcotton 9 months ago

My duties as a data scientist and forensic investigator involve writing lots of "little toy apps" for ETL and analysis.

BTW, why the disparaging reference to "little toy apps"?

[-]

sigh_again 9 months ago

>BTW, why the disparaging reference to "little toy apps"?

It's an unmaintainable, single use piece of software (that doesn't even implement the features, it just glues together already existing code) that any CS student could write in a week. Congrats on getting a really fast CS student I guess ? Not to mention the fact that perfectly viable, better alternatives are available in many places.

It's like me nailing two 2x4s together to make a shelf. Yeah, sure, I made it myself and I didn't need any woodworking knowledge, but let's just hope I don't put grandma's heavy china on it.

[-]

IggleSniggle 9 months ago

As a professional 2x4 nailer and gluer, I assure you that I have a ton more deliverables for my client. The downside is that now I actually have to put some thought into my work; you know, put some engineering work into it.

The upside is that I can produce a shit-ton of one-shot code in record time, so I've got time to face the downside.

williamcotton 9 months ago

I had a Claude Project write a number of CLI tools that interact, eg

  search_documents "search term" --bm25 --table-prefix some_project | fulltext -hl -v | less

  search_documents "search term" --bm25 --table-prefix some_project | metadata

It inserts documents by piping paths into another script, eg,

  find /some/path/*.pdf | insert_documents --table-prefix some_project

The documents end up in a Postgres database with pg_search bm25, tsvector, and semantic embeddings (from a local model).

I would estimate that I only wrote 5% of the code in the project with the rest coming from the LLM.

Sure, it's just a few hundred lines of code but it's been stable and helpful to get through some very large tranches of discovery material.

[-]

simonw 9 months ago

I had Claude write me a Bash script for running prompts (and images) against Google Gemini this morning - I really like it for Bash, because I never committed any of the Bash idioms to memory myself. https://til.simonwillison.net/llms/prompt-gemini

philippemnoel 9 months ago

Cool to see ParadeDB is now in Claude/ChatGPT :)

simonw 9 months ago

I think being able to build something in less than three minutes that would take a CS student a week is pretty worthwhile, personally.

mvdtnz 9 months ago

How would you refer to the example apps in the OP's link? They are almost definitionally toy apps, and definitely little (a handful of pages of code including all of the HTML).

skydhash 9 months ago

It's related to this: https://xkcd.com/1205/

As a programmer (which is the requisite to build such tools even with LLMs), I have a plethora of tools to do the tasks, what I choose and how much time I invested in in that depends on something similar to this chart, but with an added dimension: interest.

Take for example the URL extraction. For one single occasion, I'd probably use VIM and macros to quickly do it. If it were many pages, I'd write a script. If it were infrequent, but recurrent, I'd take the time to write a better script and would only write a web page if the use case was shared with other people or if I wanted a cross platform solution.

I believe the first question one should ask before building is why. That leads you to find a better UX than shoehorning everything inside a web app.

[-]

IggleSniggle 9 months ago

I 100% agree with the point you are making. The only aspect that obscures it is that paying employers will happily pay to support interests that are, on inspection, a waste of time.

In that aspect, I am hopeful. Maybe if "waste of time" activities are commoditized, "professionals" can instead focus on "what is important," whatever that might be.

[-]

skydhash 9 months ago

I don’t mind if my employer bought a subscription. But my personal motto if that if I have to do something multiple time, it should take less and less time. I reuse code heavily (which is why I learned vim as it makes that fluid). And that’s where LLMs becomes useless because they need the entire context to generate something. Which means I have to type it out for them in addition to what I want. And the whole thing becomes a drag. Maybe Cursor and the likes could help, but the code is only half the story, there are things like protocols, messages format, specs,…

What LLMs promise is endless drag. I try to structure my work to ensure that the final velocity is high.

[-]

simonw 9 months ago

I copy and paste code examples into LLMs all the time. They're extremely good at figuring out which parts of the context are relevant, so I don't find myself needing to do any editing at all - I find the right example, paste it in and add my prompt at the end.

This app for example - which runs OCR against PDF files entirely in the browser - was assembled by pasting in an example of PDF.js usage and an example of Tesseract.js usage and having it figure out the rest: https://simonwillison.net/2024/Mar/30/ocr-pdfs-images/

[-]

skydhash 9 months ago

While I applaud the result, it's again one of the thing that would be a quick script if it were only for my personal usage. Because I wouldn't bother with making it user friendly as I'm the single user.

The kind of project I work on is more like this: Build an Android app for a quiz game. The quiz takes a list of random question from a set. Each set is a package that can be installed and upgraded when online. While the app is free, there is an activation code to be able to download the main packages. The app should work offline except for the activation and downloading packages. It also should notify when a new version is ready for a package. etc...

I don't know if LLMs could have helped me at the time (pre 2020), but I doubt it. Not because the code was complex, but mostly how cohesive the whole thing should be while taking care they're not tightly coupled and be maintainable by a single person. The IDE was a great helper once I got the design and the architecture outlined, mostly because it was deterministic and I already know what the end result should be.

[-]

simonw 9 months ago

GPT-3 came out in early 2020, prior to that we had just GPT-2 which was mildly interesting at best but not something that could generate usable code.

The best current LLMs GPT-4o, Gemini 1.5 Pro, Claude 3.5 Sonnet - are just about at the point now where I'd expect them to be able to get a useful chunk of your Android spec there done. Which is pretty wild!

9 months ago

[deleted]

infoseek12 9 months ago

It’s extraordinary!

Perhaps how quickly we become jaded should be taken as evidence of how quickly the world is changing right now.

When I looked at the examples they seemed like the kind of one off scripts, of limited complexity, that we’ve seen many times in the last year or so.

rtzand 9 months ago

Perhaps people want to read other blogs once in a while. We are at a stage of AI glut. If people pump out so much content in such a short time, no one can read it all (or is interested to read it).

yen223 9 months ago

To have discussions people need to be able to see your post, and on HN it can be a matter of luck.

ttepasse 9 months ago

Apart from chance HN seems rather time dependent – those posts seem to be posted still early in the day (I assume UTC from the timestamps) when the mostly US-American visitors doesn’t seem yet in the slacking-of mode.

tessierashpool9 9 months ago

i love the design ... especially because it is underwhelming as opposed to overwhelming! cause i care for the _content_ and not the _design_!

harry8 9 months ago

Anyone got a "best practises" or even a few "my workflow" blog posts on how to best use LLM's with a local code base?

Just saw someone recommend:

https://aider.chat/

Thought there might be more worth exploring from this community. ;-)

[-]

ripley12 9 months ago

I've written a bit about how I use Aider: https://www.reillywood.com/blog/how-i-use-llms-sep-2024/

It's not magic, but it is very useful. I find that it works best if you have a commit-sized piece of work in mind; something that might take you half an hour but can be described in a few sentences.

paradite 9 months ago

I wrote one: https://prompt.16x.engineer/blog/ai-coding-workflow

alphan0n 9 months ago

That’s pretty slick.

simonw 9 months ago

There are a bunch of comments in this thread along the lines of "these are just toys" and "anyone could build these without an LLM".

I need to update my post to emphasize this, but that's kind of the point.

Every one of these 14 tools (with the possible exception of the OpenAI Audio debugger one, that one's quite hard) is something any web programmer could build relatively quickly.

... but not as quickly as I did with an LLM, because they almost all took less than 5 minutes from idea to finished implementation.

The key point is that if I didn't have Claude to help build these, I wouldn't have built them at all. None of them would justify even an hour of work - they weren't essential tools that I needed to get stuff done, they were just things I built because building them is now so cheap (in terms of time) that there was no reason not to.

That's the real magic here. The cost of knocking out a single page app that does something simple is often now lower than even the cost of spending a few minutes on Google trying to find an existing tool that solves the same problem.

[-]

harisec 9 months ago

These are toys but in 2 years they will probably be full projects and 2 years later people will ask "why do i need a software developer?"

[-]

simonw 9 months ago

I just don't think that's true.

If all someone does is write code based on specifications handed over by someone else then yes, they have cause to be worried - but in my career as a software engineer the "typing code into a computer" bit has only ever been 10-20% of the work that I do.

The big challenge of software development has always been turning human needs into working software. That requires a great depth of experience in terms of what's possible, what isn't possible, how software works and how to architect and design software to deliver value today while still staying flexible for future development.

LLMs can accelerate that process a bit, but I don't think they can replace it. Someone still has to drive the LLMs. I think people with software development skills are best placed to do that.

[-]

harisec 9 months ago

That's a good point and I agree with you. However, would you agree that in a few years we will need far less developers than we need right now?

[-]

simonw 9 months ago

I had a podcast conversation about this recently: https://newsletter.pragmaticengineer.com/p/ai-tools-for-soft...

I think LLMs mean developers can build stuff faster, which reduces the cost of developing software.

My optimistic scenario is that this expands the market for custom software, a lot. Companies that would never have considered developing their own software - because they'd need six developers working for twelve months - can now afford to do so, because they need two developers for three months instead.

The result is more jobs for engineers, and engineers become more valuable because they'd can get more done.

I'm not an economist so I won't pretend I'm confident this will happen, but it's my optimistic scenario.

[-]

palata 9 months ago

Not only it has the potential to increase productivity: it has the potential to lower the overall quality of software (by making it more accessible to people who don't really understand how to write good code).

I believe that we can already observe that modern tools/languages have made programming a lot more accessible, and that the average quality of software has decreased dramatically (not that all software is bad: just that this new accessibility brought a lot more bad software than good software).

Your example is interesting: it says "it's good because people will be able to produce more", not "developers will have more time to focus on fixing bugs and optimizing their code".

chadcmulligan 9 months ago

All these sort of statements assume growth is linear without justification, its more likely exponential. ie it took 2 years to get here, so it will take 2 more to get to this point, but in reality it may be 100 years to get to the next point. No one knows, and if in 2 years it is able to write that sort of code then the singularity is very close indeed.

[-]

harisec 9 months ago

Actually, i think it will take less than 2 years. I've been using Aider + Claude 3.5 Sonnet almost daily for a long time and the progress is very fast. We will see.

sogen 9 months ago

Yes, exactly my thoughts. I needed a RSS converter from JSON to XML, and was able to quickly made one.

I'm not a programmer.

It's life-changing.

wraptile 9 months ago

Most of these are really just 10 lines of python code. The value of generating an entire HTML with GUI is great but then the overhead comes in when you need to modify it or fix something or god forbid add a dependency library and you end up spending more time than actually building the tool from scratch. It's getting close though.

[-]

vasco 9 months ago

How many google results amount to this level of complexity though? How many phone apps?

Anyone taking bets on how many years till the operating system doesn't install any software anymore, and just dynamically generates whatever software you need on the fly? "Give me a calculator app" is doable today "give me an internet browser" isn't but it should be a matter of time.

[-]

wraptile 9 months ago

> "Give me a calculator app" is doable today "give me an internet browser" isn't but it should be a matter of time.

I just don't see that as being a sound solution. If the user requesting to solve a solved task then using an existing tool will always be a more efficient path.

What I do see as a desired option is where AI could take an existing tool and personalize it to your specific use case. In this example it takes web browser as a GUI lib and wraps it around generic solutions like library that quotes HTML entities which is kinda this just very poorly done so far.

I'd imagine that future programs will be much more component driven with AI connecting components to produce personalized solutions as this is really the only viable option until AI can reason at least in some reasonable capacity to fix its own mistakes.

hshshshshsh 9 months ago

> Give me a calculator app" is doable today "give me an internet browser" isn't but it should be a matter of time.

Based on what? Magical thinking?

xster 9 months ago

Anthropic is so close to getting to a WeChat-esque store-less super-app state. It just needs a way to gather all your published artifacts and surface them easily in the sidebar like your favorited chats.

Since Elon is so interested in that model, if xAI had Claude's capabilities, they would surely go with that angle

fHr 9 months ago

if this aren't amazing times to be alive I don't know, this is insane, I also started to learn some rust on the weekend and it is nuts how good a chatgpt 4 can be as a teacher to support you on the fly

[-]

bongodongobob 9 months ago

No you don't understand. My programming is so advanced that LLMs cower and shut themselves down when I try to use them. I am a senior developer! Clearly you don't know what you're doing and all the code it writes is bad and you'll never be able to maintain it. LLMs can't invent cutting edge, never before seen code that I do everyday because the problems I solve are so advanced even god himself can't understand my codebase. /s

[-]

9 months ago

[deleted]

nichochar 9 months ago

We built an open-source and local tool that allows you to take these even further. Highly recommend plugging in the latest model, but you can keep iterating on the apps.

Currently also on the front page https://news.ycombinator.com/item?id=41926067

yapyap 9 months ago

wish there was an option to hide all chat AI related topics on HN

[-]

tomrod 9 months ago

HN has always had a focus on what will change the world 5 years out.

Chat-based AI is a remarkable set of functionalities to build with. It isn't the only improvement in Tech, let alone AI/ML, but it is massive.

I quite enjoy learning more about it from dedicated folks like simonw.

GaggiX 9 months ago

Ask Claude to write the browser extension for you so you can stop whining.

thimabi 9 months ago

Maybe it’s time to create a new HN client with this feature. Ironically, an AI can be of assistance in filtering AI-related submissions or comments.

foobarqux 9 months ago

I just don't seem to find this stuff as useful to me as people are portraying. Take the "extract URLs" example: I would just do

     curl -sL $URL | htmlq 'a' -a href

[-]

simonw 9 months ago

Harder to run that on a phone. I built most of these little web apps so I could use them from Mobile Safari.

Also it doesn't look like htmlq can handle pages that render their content with JavaScript. If you want to do that you might find my shot-scraper CLI utility useful: https://shot-scraper.datasette.io/en/stable/javascript.html

    shot-scraper javascript https://simonwillison.net/ 'Array.from(document.links).map(a => a.href)'

[-]

foobarqux 9 months ago

I can point to similar problems even in CLI-target apps. In Nicholas Carlini's post for example he shows how LLMs helped him make curl parallel by piping to the "parallel" utility. That works but no sane person would do it given that curl has built-in parallel processing via the "-Z" flag which you could have found in 10 seconds by opening the man page. I'm sure this was an instance of a developer (truly) believing they became 10x more productive.

These aren't even the "hard" problems that are beyond the reach of LLMs today; they seem like things they should be able to do. It's just that, today, they just aren't achieving the spectacular results that many are claiming; it's mostly pretty crappy.

shot-scraper looks nice.

pnut 9 months ago

That's pretty happy path for one, and for two, how exactly are you doing it? Not by holding down a red button on your phone and talking into it, that's for sure.

For three, add one more subtle requirement to the task, and now you're reading awk manpages and trial-and-erroring perl oneliners.

cpursley 9 months ago

Yeah, sure - if you have the memory that allows that sort of recall. For the rest of us, LLMs are like Alzheimer’s medication or eye glasses. Believe it or not, these types of esoteric commands are very difficult for some of us to remember - but AI is amazing at this sort thing (Unix commands, etc as well as trouble shooting them).

[-]

foobarqux 9 months ago

That might be a reasonable argument if the LLM suggested something similar to the command I posted instead of an incredibly complicated webapp.

As is it just spits out migraine-inducing "it-works-doesn't-it" solutions from someone starting to learn to program.

sureglymop 9 months ago

I mean they may also make your memory worse if you always go straight for the llm instead of trying to remember.

[-]

ben_w 9 months ago

I can't remember, was it Aristotle or Plato who said that about writing?

[-]

svieira 9 months ago

It was Socrates - and he was correct. When was the last time you met someone who could recite The Iliad from memory?

But more to the point ... in Phaedrus he's not talking about "who will memorize the Iliad now that we have the written word", he's talking about "can the written word _teach_". And the answer (as always) is "no and yes".

> and now you, who are the father of letters, have been led by your affection to ascribe to them a power the opposite of that which they really possess. For this invention will produce forgetfulness in the minds of those who learn to use it, because they will not practice their memory. Their trust in writing, produced by external characters which are no part of themselves, will discourage the use of their own memory within them. You have invented an elixir not of memory, but of reminding; and you offer your pupils the appearance of wisdom, not true wisdom, for they will read many things without instruction and will therefore seem [275b] to know many things, when they are for the most part ignorant and hard to get along with, since they are not wise, but only appear wise.

https://www.perseus.tufts.edu/hopper/text?doc=Perseus%3Atext... and https://www.gutenberg.org/files/1636/1636-h/1636-h.htm#link2....

[-]

IncreasePosts 9 months ago

Tons of people can recite the Bible and the Koran from memory.

wrtasfg 9 months ago

But where do chatbots fit into this? Cribbing from your neighbor in an exam? Watching TV?

IggleSniggle 9 months ago

I'm pretty sure it was Socrates but I might be hallucinating

skydhash 9 months ago

> Yeah, sure - if you have the memory that allows that sort of recall.

You don't memorize them. You learn the foundational knowledge (in this case how http works and the html format, and a bit of shell scripting), then read the manuals and compose the commands. And as days pass, you save interesting snippets somewhere. Then it becomes easier each time you interact with the tools.

Anyone would find ffmpeg or imagemagick daunting if they don't know anything about audio or graphics.

[-]

cpursley 9 months ago

Enough with the gatekeeping.

I understand the fundamentals of how git works under the hood. But the cli commands are in no way shape or form, intuitive (just one example). But LLMs nail them every time when I forget.

[-]

skydhash 9 months ago

How many varied tasks do you do with git? I keep books and manuals at hand for the one time I need them, not for continual consultation. I have alias for frequently used commands, functions for the complicated one and use magit for day to day operation (to continue with your example). Using LLMs is your choice and I don’t have any say in that. I don’t use them because they’re useless to me. What you may see as complicated may be a walk in the park for someone else.

qingcharles 9 months ago

I get your point, but that doesn't exactly replicate the original tool which lets you just paste a chunk of rich text in.

And I would have had to use GPT to give me the syntax for that command line anyway :)

Eliezer 9 months ago

Meanwhile, no luck getting it to build something that reverses a GIF. (Also, weirdly, no luck with finding a working GIF reverser online.) (Trying to reverse this: https://www.tumblr.com/necessary-disorder/765064008182235136.)

[-]

synthoidzeta 9 months ago

Try running this from the CLI (you'd need to install gifsicle first):

gifsicle --unoptimize input.gif '#-1-0' > reversed.gif

SamDc73 9 months ago

These things give me a weird fuzzy feeling when looking at !

fragmede 9 months ago

ask it to install ffmpeg and have it use that to reverse the gif.

9 months ago

[deleted]

sourcecodeplz 9 months ago

It's just not a big deal.

StickyRibbs 9 months ago

i'll start panicking when it can productionalize an app and deploy it to GCP without any errors.

[-]

raincole 9 months ago

Then you should have been panicking for several months already.

shishy 9 months ago

I used cursor to manage spinning up and deploying a full stack app in AWS last week. Took me one afternoon.

trhway 9 months ago

I think you've just gave an idea to somebody's next startup, and we'll probably see it is being done in half-a-year. In general all that tedious YAML/etc. is ripe for the "autocompletion AI".

[-]

CptFribble 9 months ago

until it hallucinates a config and rings up a $10,000 AWS while you're asleep

skydhash 9 months ago

And then you'll find out a node was deployed with no backup strategy while there are multiple useless ones burning money.

sigh_again 9 months ago

"Don't look at your Kubernetes configuration, trust our AI to do it well" sounds like a psyop straight out of GCP or AWS to charge you four times what they need to before telling you "no, you absolutely need that $500 charge for your 1RPS static website, yes yes absolutely."

[-]

trhway 9 months ago

And for auditing your config and for reviewing your cloud provider's [autogenerated by AI] offers and suggestions will be another AI which will also be able to chat with their customer support AI.

ndndjdueej 9 months ago

Invert that.

AI makes docker compose app. Cloud providers that cannot deploy a docker compose app simply and without errors will miss out.

9 months ago

[deleted]

djoldman 9 months ago

@simonw: jina is getting cranky:

https://tools.simonwillison.net/jina-reader?

{"data":null,"code":451,"name":"SecurityCompromiseError","status":45102,"message":"Your request is categorized as abuse. Please don't abuse our service. If you are sure you are not abusing, please authenticate yourself with an API key.","readableMessage":"SecurityCompromiseError: Your request is categorized as abuse. Please don't abuse our service. If you are sure you are not abusing, please authenticate yourself with an API key."}

[-]

flakiness 9 months ago

TIL Jina Reader API has (low rate limit) API key free option. https://jina.ai/reader/

rtpg 9 months ago

I am a bit frustrated that I don't have a great "tool" environment to build out this stuff, because of having to futz with the I/O. Like most of that stuff is "well I know how to write the Python to do the last step, but wrapping it all up in a simple web UI is Too Much Work". That effort might be small, but it's still orders of magnitude larger than the snippet!

Lots of TUI interfaces try to approximate this, but I think I really just need to build out something a bit like https://anvil.works/

[-]

psadri 9 months ago

You should try SkyMass. You can put up a web ui in as few as 10 loc. and something actually useful in around 50.

Contact me if you need some help getting going…

freediver 9 months ago

A very interested paradigm introduced here by Anthropic is that this content is hosted. And the output of LLM is amde a self-hosted app, ready for consumption by consumer. Not far away from build my own site kind of thing.

rahimnathwani 9 months ago

Here's one I built with Claude last week: https://news.ycombinator.com/item?id=41855594

ToJans 9 months ago

I fully agree.

I think Claude offers me 10x productivity, especially for all these helper apps and technical POCs that I typically create during the week.

And that's without even mentioning mail chain replies, analysis of legal or financial documents, helping my kids with their math assignments,...

It's a huge enabler for me, and it's getting better every month.

We are getting up the abstraction ladder faster and faster, and I cannot even imagine where we will end up within a few months, or a few years.

pluc 9 months ago

and you didn't learn a goddamn thing

ainiriand 9 months ago

Hey some time ago I needed this at work:

https://www.jsoncomments.com/

it is basically a tool to add some additional text to json text files and interpret it as comments for each line. I did it with ChatGPT.

dev0p 9 months ago

A useful prompt to quickly generate this kind of website is

"generate an index.html for {idea}".

It's so much faster to just work within a single file. Of course, you have to be limited in scope, but for quick tools such as these it's excellent.

gcanyon 9 months ago

This is pretty amazing: I asked it to build a single-page app to accept and parse a PDF and give it back in a JSON structure I provided, and it did exactly that!

The parsing was terrible, but it worked at all, which is impressive. It suggested multiple next steps, one of which was to work harder on parsing, so I told it to do that. The resulting app showed a similar UI, but was non-responsive. I noticed it had output a message in small text that said "Claude’s response was limited as it hit the maximum length allowed at this time."

So I told it that it had gone over its own limits and to try again, making an effort to stay within its limits. It tried again, and this time the UI didn't even render beyond a text outline of the UI elements, and I got the same message.

So: pluses and minuses.

bilsbie 9 months ago

What’s the best and easiest way to host and share these artifacts? GitHub html?

[-]

simonw 9 months ago

Easiest is to use the "publish artifact" button, but that wraps them in a giant blob of obtuse React code which means they load slowly and view source is pretty much useless, e.g. https://claude.site/artifacts/46897436-e06e-4ccc-b8f4-3df90c...

Second easiest is to take the code and paste it in a Gist - like this: https://gist.github.com/simonw/14a2c3ef508839f26377707dbf5dd...

And then take the Gist ID and add it to this URL:

https://gistpreview.github.io/?14a2c3ef508839f26377707dbf5dd...

That gives you a URL you can load in your browser.

My preferred route is to host the generated HTML directly myself. I mainly use that via GitHub Pages - I can drop an extract-urls.html file into https://github.com/simonw/tools and about 20 seconds later it becomes available at https://tools.simonwillison.net/extract-urls

Those last two options only work for Artifacts that didn't use React (that's why I use "no react" in most of my artifact prompts). If you DID use React you can turn that into a standalone HTML and JavaScript app that you can deploy using https://github.com/claudio-silva/claude-artifact-runner - I wrote some notes on using that here: https://simonwillison.net/2024/Oct/23/claude-artifact-runner...

[-]

bilsbie 9 months ago

I had no idea gistpreview existed! That’s really easy. I guess you can’t host static files like images though?

Do you do anything special to make the tools directory work like that?

[-]

simonw 9 months ago

I have some note on that here:

- https://til.simonwillison.net/github/custom-subdomain-github...

- https://til.simonwillison.net/github/github-pages

corytheboyd 9 months ago

Just in case you need it: https://github.com/gchq/CyberChef

I was just trying to be helpful, since it was relevant to content in the post…

[-]

burgerquizz 9 months ago

if i want to just paste an url, of full page of HN, and extract all comments in a json format. would that tool work?

[-]

sigh_again 9 months ago

[...document.querySelectorAll(".commtext").values().map((it) => it.innerText)]

Works in every single web browser. No calls to OpenAI needed, and I'm rusty on Javascript. Make it a bookmarklet, and you don't even need to run a dedicated webpage on your machine for that.

corytheboyd 9 months ago

You already know the answer to that. This is just a very helpful interface for arbitrary data conversions, that I thought passers by might like to know about. It’s not an LLM, but it does what some of the examples in the article does, and more.

9 months ago

[deleted]

joeevans1000 9 months ago

These LLMs create significantly useful code and they are getting better. Those professionals who deride their utility are putting themselves at risk of not strategizing well for their future.

almog 9 months ago

For the majority of these, a simple google search would have lead to an existing program/website that does the same thing.

We're past the POC stage. LLMs can generate code for simple programs. It's when you try to tweak the requirements and point how a program introduces a bug that you eventually realize they still fail to take you through the last mile just as they did year and a half ago.

[-]

simonw 9 months ago

"For the majority of these, a simple google search would have lead to an existing program/website that does the same thing."

That's what's so wild about this: that's true, and yet in most of those cases it's still faster and more productive for me to ask Claude to build me a brand new tool _from scratch_ than it is for me to try and find an existing one via Google.

The problem with trying to Google for these kinds of things is that you have to evaluate the results that come back and figure out which one of them correctly solves your problem. That's a few extra steps.

It's genuinely faster to prompt something like this instead:

> Build an artifact (no react) where I can paste text into a textarea and it will return that text with all HTML entities - single and double quotes and less than greater than ampersand - correctly escaped. The output should be in a textarea accompanied by a "Copy to clipboard" button which changes text to "Copied!" for 1.5s after you click it. Make it mobile friendly

Done: https://claude.site/artifacts/46897436-e06e-4ccc-b8f4-3df90c...

In this case I knew exactly what I wanted: it had to do less than, greater than, ampersand, double quotes AND single quotes. I know from past experience that many tools like this forget about single quotes, so I'd have to evaluate any tools I found to check that they do that. And I was on my phone so I wanted a "copy to clipboard" button.

[-]

almog 9 months ago

Simon, all these example are great and on a side note I've learned a lot through your writing. However, as so many other AI for low/no code examples, it starts feeling like trying to perform a dexterity intensive task such as knotting while wearing thick gloves. All these examples are great on their own. What to me feels so wrong about seeing so many of these on HN front page is that my experience has been that even after almost 2 years from when ChatGPT was initially introduced, all models that I've been using improved a lot in accuracy, speed, input/output size but still more often than not fail to be useful when used with even a small _existing_ code base and an evolving set of tasks that aren't the busy work / boiler plate type.

[-]

simonw 9 months ago

I'm working on a longer piece of writing about this, but I think a lot of this comes down to the fact that my personal style of developing code is extremely compatible with LLMs.

I have hundreds of projects which are small single page HTML apps, single purpose command line, utilities or plugins for my larger projects.

All of these are short enough to fit into the context window of an LLM.

If all I worked on were 1-2 hundred thousand or million line applications I would get significantly less value out of LLMs.

skydhash 9 months ago

I don't know if I'd take less time, but I would definitely type less.

tomcam 9 months ago

I love love love that the transcripts are included.

bravura 9 months ago

Is the realtime audio API now open to everyone?

[-]

simonw 9 months ago

I'm not using the Realtime audio API here - I'm using the Chat Completion API, which added support for audio files a few days ago: https://simonwillison.net/2024/Oct/18/openai-audio/

purple-leafy 9 months ago

Wild that people lap this up. Absolutely wild

kevinmerritt 9 months ago

I learn so much from you. Thank you.

9 months ago

[deleted]

v3ss0n 9 months ago

None of them worth writing home about

thimabi 9 months ago

I take tools like these as an inspiration. All of us have at least some trivial tasks that can be automated. In the past, automating them might have been a hassle, but with LLMs, that’s no longer the case. I, for one, have a “scripts” folder with dozens of one-off mini-apps to handle specific tasks, and this folder keeps growing every day.

varelse 9 months ago

[dead]

ConanRus 9 months ago

[dead]

9 months ago

[deleted]

soutong 9 months ago

[dead]

sfmike 9 months ago

[flagged]