I have about 6 months of coding experience. All I really knew was how to build a basic MERN app
I’ve been using Sonnet 3.5 to code and I’ve managed to build multiple full fledged apps, including paid ones
Maybe they’re not perfect, but they work and I’ve had no complaints yet. They might not scale to become the next Facebook, but not everything has to scale
I think the front end is the most interesting place right now, because it’s where people are making stuff for themselves with the help of LLMs.
The browser is a great place to build voice chat, 3d, almost any other experience. I expect a renewed interest in granting fuller capabilities to the web, especially background processing and network access.
What do you do if your app has a bug that your LLM isn't able to fix? is your coding experience enough to fix it, or do you ship with bugs hoping customers won't mind?
Presumably what is possible for a person with 6 months of experience is rather limited.
The idea as I understand it is that he achieved apps that he would not be able to write by himself, with the help of AI. That means that it is possible to have bugs that would be reasonable to fix for someone who built the app using their own knowledge, but for the junior they may be too hard. This is a novel situation.
Just because everyone has problems sometimes does not mean problems are all the same, all the same difficulty. Like if I was building Starship, and I ran into some difficult problem, I would most likely give up, as I am way out of my league. I couldn't build a model rocket. I know nothing about rockets. My situation would not be the same as of any rocket engineer. All problems and all situations and all people are not the same, and they are not made the same by AI, despite claims to the contrary.
> I am looking forward to this type of real time app creation being added into our OSs, browsers, phones and glasses.
What do you see that being used for?
Surely, polished apps written for others are going to be best built in professional tools that live independently of whatever the OS might offer.
So I assume you're talking about quick little scratch apps for personal use? Like an AI-enriched version of Apple's Automator or Shortcuts, or of shell scripts, where you spend a while coahcing an AI to write the little one-off program you need instead of visually building a worrkflow or writing a simple script? Is that something you believe there's a high unmet need for?
This is an earnest question. I'm sincerely curious what you're envisioning and how it might supercede the rich variety of existing tools that seem to only see niche use today.
When I was in college (10+ years ago) there was a system that allowed you to select your classes. During the selection period, certain people had priority (people a year above you got to select first).
Once a class was full, you could still get in if someone who was selected for the classes changed their mind, which (at an unpredictable time) would result in a seat becoming available in that class until another student noticed the availability and signed up.
So I wrote a simple PHP script that loaded the page every 60 seconds checking, and the script would send me a text message if any of the classes I wanted suddenly had an opening. I would then run to a computer and try to sign up.
These are the kind of bespoke, single-purpose things that I presume AI coding could help the average person with.
“Send me a push notification when the text on this webpage says the class isn’t full, and check every 60 seconds”
This sort of thing needs to be built to be in-OS or in-device or whatever term we want to use to signify that the agent has to be me to do it. Scripting a browser that already has my saved credentials to do something for me, running in device, is where more things have to go, vs external third party services where we need to continually handle external auth protocols.
There's no shortage of applications, both desktop and mobile, that never really stray outside of the default toolkits. Line of business apps, for instance, don't need the polish that apps targeting consumers need. They just need to effectively manipulate data.
The best vulnerability is one that is hard to detect because it looks like a bug. It's not inconceivable to train an LLM to silently slip vulnerabilities in generated code. Someone who does not have a whole lot of programming experience is unlikely to detect it.
tl;dr it takes running untrusted code to a new level.
I wanted to develop a simple tool to compare maps. I thought about using this opportunity to try out Claude AI for coding a project from scratch. It worked surprisingly well!
At least 95% of the code was generated by AI (I reached the limit so had to add final bits on my own).
Interestingly, I’m pretty sure they mean they hit the limit with tokens on Claude.
There’s a daily 2.5 million token limit that you can use up fairly quickly with 100K context
So they may very well have completed the whole program with Claude. It’s just the machine literally stopped and the human had to do the final grunt work.
We’ve been hitting this in our work and in experimentation, and I can confirm that Claude sonnet 3.5 has gotten 100% of the way there, including working through errors and tricky problems as we tested the apps it built.
POCs and demos are easy to build by anyone these days. The last 10% is what separates student projects from real products.
any engineer who has spent time in the trenches understands that fixing corner cases in code produced by inexperienced engineers consumes a lot of time.
in fact, poor overall design and lack of diligence tanks entire projects.
I asked Claude AI to make me an app and it refused and called it dangerous. I asked what kind of apps they could build and they suggested social media or health. So I asked it to make one but it refused too dangerous. I asked it to make anything.. anything app and it refused. I told it it sucked and it said it didn't. Then I deleted my account.
Next obvious steps: make it understand large existing programs, learn form the style of the existing code while avoiding to learn the bad style where it's present, and then contribute features or fixes to that codebase.
I used Claude AI project to attach requirement for the project. Then I just went with single conversation. I specified that I want to do it in small steps and then was just doing copy -> paste until I reached the limit. I think it was because I was doing one big convo instead attaching code to the project.
So pretty simple flow, totally not scalable for bigger projects.
I need to read and check Cursor AI which can also use Claude models.
you can use the vscode cline to give a task and it uses a LLM to go out and create the app for you.
In django i had it create a backend, set admin user, create requirements.txt and then do a whole frontend in vue as a test. It even can do screen testing and tested what happens if it puts a wrong login in.
Click the "Share" button in the upper right corner of your chat.
Click the "Share & Copy Link" button to create a shareable link and add the chat snapshot to your project’s activity feed.
/edit: i just checked. i think they had a regression? or at least i cannot see the button anymore. go figure. must be pretty recently, as i shared a chat just ~2-3 weeks ago
Note the section you’re in at that doc link: “Claude for Work (Team & Enterprise Plans) ->
Team & Enterprise Plan Features
-> Project visibility and sharing”
This sort of thing will be interesting to me once it can be done with fully local and open source tech on attainable hardware (and no, a $5,000 MacBook Pro is not attainable). Building a dependence on yet another untrustworthy AI startup that will inevitably enshittify isn’t compelling despite what the tech can do.
We’re getting there with some of the smaller open source models, but we’re not quite there yet. I’m looking forward to where we’ll be in a year!
Yep. Typical landscape crew rolls with $50k in equipment (maybe more). People push back on tooling pricing in other industries (especially when the tooling is "soft') but have no clue what that the cost of doing biz is huge for others.
You're pretty lucky if the specialised tools for your profession cost <$2,000/y to replace and maintain. Sometimes tools last many years but cost an order of magnitude more anyways. Sometimes tools require expensive maintenance after purchase. Sometimes they are obsolete in a short number of years. Sometimes they were out quickly with use. Sometimes all of the above.
Regardless the reasons, any tooling in the ~$5,000/3 year ballpark is not at all a high or unique number for a profession.
I like opensource and reproducible methods too. but here, the code was written by claude and then exported. Is that considered a dependency? They can find a different LLM or pay someone to improve/revise/extend the code later if necessary
The nice thing is it doesn't really matter all too much which you use "today", you can take the same inputs to any and the outputs remain complete forever. If the concern is you'll start using these tools, like them, start using them a lot, then are worried suddenly all hosted options to run a query disappear tomorrow (meaning being able to run local is important to you) then Qwen2.5-Coder 32B with a 4 bit quant will run 30+ tokens/second will give you many years of use for <$1k in hardware.
If you want to pay that <$1k up front to just say "it was always just on my machine, nobody elses" then more power to you. Most just prefer this "pay as you go for someone else to have set it up" model. That doesn't imply it's unattainable if you want to run it differently though.
> (and no, a $5,000 MacBook Pro is not attainable)
I know we all love dunking on how expensive Apple computers are, but for $5000 you would be getting a Mac Mini maxed-out with an M4 Pro chip with 14‑core CPU, 20‑core GPU, 16-core Neural Engine, 64GB unified RAM memory, an 8TB SSD and 10 Gigabit Ethernet.
I get where GP is coming from and it's not really related to typical Apple price bashing. You can list the most fantastical specs for the craziest value and it all really comes down to that single note: "64 GB memory for the GPU/NPU" - where the mini caps out. The GPU/NPU might change the speed of the output by a linear factor but the memory is a hard wall of how good a model you can run and 64 GB total is surprisingly not that high in the AI world. The MacBook Pro units referenced at $5k are the ones that support 128 GB, hence why they are popularly mentioned. ~ the same $ for the Mac Studio when you minimally load it up to 128 GB. Even then you're not able to run the biggest local models, 128 GB still isn't enough, but you can at least run the mid sized ones unquantized.
What I think GP was overlooking is newer mid range models like Qwen2.5-Coder 32B produce more than usable outputs for this kind of scenario on much lower end consumer (instead of prosumer) hardware so you don't need to go looking for the high memory stuff to do this kind of task locally, even if you may need the high memory stuff for serious AI workloads or AI training.
I have about 6 months of coding experience. All I really knew was how to build a basic MERN app
I’ve been using Sonnet 3.5 to code and I’ve managed to build multiple full fledged apps, including paid ones
Maybe they’re not perfect, but they work and I’ve had no complaints yet. They might not scale to become the next Facebook, but not everything has to scale
In your opinion as a newer dev, what were the most complicated things that sonnet was able to do and was not able to do.
I think the front end is the most interesting place right now, because it’s where people are making stuff for themselves with the help of LLMs.
The browser is a great place to build voice chat, 3d, almost any other experience. I expect a renewed interest in granting fuller capabilities to the web, especially background processing and network access.
What do you do if your app has a bug that your LLM isn't able to fix? is your coding experience enough to fix it, or do you ship with bugs hoping customers won't mind?
If customers do mind then at best it's an opportunity cost (less people will buy). Shipping with bugs > not shipping, simple as.
What does anyone do if they have a bug they don't know how to fix?
Find a way to work around it.
What's the point of this question?
Everybody ships nasty bugs in production that he himself might find impossible to debug, everybody.
Thus he will do the very same thing me, you or anybody else on this planet do, find a second pair of eyes, virtually or not, paying or not.
Presumably what is possible for a person with 6 months of experience is rather limited.
The idea as I understand it is that he achieved apps that he would not be able to write by himself, with the help of AI. That means that it is possible to have bugs that would be reasonable to fix for someone who built the app using their own knowledge, but for the junior they may be too hard. This is a novel situation.
Just because everyone has problems sometimes does not mean problems are all the same, all the same difficulty. Like if I was building Starship, and I ran into some difficult problem, I would most likely give up, as I am way out of my league. I couldn't build a model rocket. I know nothing about rockets. My situation would not be the same as of any rocket engineer. All problems and all situations and all people are not the same, and they are not made the same by AI, despite claims to the contrary.
> Everybody ships nasty bugs in production that he himself might find impossible to debug, everybody.
No.
Some people haven't realised it yet.
What I see is people using llm to make a new app without the bug
Can you share some examples?
Claude built me a simple react app AND rendered it in it's own UI - including using imports and stuff.
I am looking forward to this type of real time app creation being added into our OSs, browsers, phones and glasses.
> I am looking forward to this type of real time app creation being added into our OSs, browsers, phones and glasses.
What do you see that being used for?
Surely, polished apps written for others are going to be best built in professional tools that live independently of whatever the OS might offer.
So I assume you're talking about quick little scratch apps for personal use? Like an AI-enriched version of Apple's Automator or Shortcuts, or of shell scripts, where you spend a while coahcing an AI to write the little one-off program you need instead of visually building a worrkflow or writing a simple script? Is that something you believe there's a high unmet need for?
This is an earnest question. I'm sincerely curious what you're envisioning and how it might supercede the rich variety of existing tools that seem to only see niche use today.
When I was in college (10+ years ago) there was a system that allowed you to select your classes. During the selection period, certain people had priority (people a year above you got to select first).
Once a class was full, you could still get in if someone who was selected for the classes changed their mind, which (at an unpredictable time) would result in a seat becoming available in that class until another student noticed the availability and signed up.
So I wrote a simple PHP script that loaded the page every 60 seconds checking, and the script would send me a text message if any of the classes I wanted suddenly had an opening. I would then run to a computer and try to sign up.
These are the kind of bespoke, single-purpose things that I presume AI coding could help the average person with.
“Send me a push notification when the text on this webpage says the class isn’t full, and check every 60 seconds”
This sort of thing needs to be built to be in-OS or in-device or whatever term we want to use to signify that the agent has to be me to do it. Scripting a browser that already has my saved credentials to do something for me, running in device, is where more things have to go, vs external third party services where we need to continually handle external auth protocols.
There's no shortage of applications, both desktop and mobile, that never really stray outside of the default toolkits. Line of business apps, for instance, don't need the polish that apps targeting consumers need. They just need to effectively manipulate data.
Hard to say as someone with the power.
Ask a bird what flying is good for and their answer will be encumbered by reality.
Kind of the opposite of “everything looks like a nail”.
That will be a whole new level of malware attack angle.
Every new tech is a new attack surface.
Can you expand on what you mean by this, and why?
The best vulnerability is one that is hard to detect because it looks like a bug. It's not inconceivable to train an LLM to silently slip vulnerabilities in generated code. Someone who does not have a whole lot of programming experience is unlikely to detect it.
tl;dr it takes running untrusted code to a new level.
WebAssembly sandboxes might become hadny.
I wanted to develop a simple tool to compare maps. I thought about using this opportunity to try out Claude AI for coding a project from scratch. It worked surprisingly well!
At least 95% of the code was generated by AI (I reached the limit so had to add final bits on my own).
The problem is that you must understand that 95% in order to complete the last 5%.
Interestingly, I’m pretty sure they mean they hit the limit with tokens on Claude.
There’s a daily 2.5 million token limit that you can use up fairly quickly with 100K context
So they may very well have completed the whole program with Claude. It’s just the machine literally stopped and the human had to do the final grunt work.
We’ve been hitting this in our work and in experimentation, and I can confirm that Claude sonnet 3.5 has gotten 100% of the way there, including working through errors and tricky problems as we tested the apps it built.
exactly right.
POCs and demos are easy to build by anyone these days. The last 10% is what separates student projects from real products.
any engineer who has spent time in the trenches understands that fixing corner cases in code produced by inexperienced engineers consumes a lot of time.
in fact, poor overall design and lack of diligence tanks entire projects.
Sometimes it's not even inexperienced coders -- it's our own dang selves ;-)
I asked Claude AI to make me an app and it refused and called it dangerous. I asked what kind of apps they could build and they suggested social media or health. So I asked it to make one but it refused too dangerous. I asked it to make anything.. anything app and it refused. I told it it sucked and it said it didn't. Then I deleted my account.
I can't think of a worse llm than Claude.
This is great progress.
Next obvious steps: make it understand large existing programs, learn form the style of the existing code while avoiding to learn the bad style where it's present, and then contribute features or fixes to that codebase.
Been using LLMs since got3 beta in June 2021 and it’s interesting to see how my use cases have continuously been upgraded as models advanced.
Started off with having it create funny random stories, to slowly creating more and more advanced programs.
It’s shocking how good 3.5 Sonnet is at coding, considering the size of the model.
Cool! Did you just prompt -> copy -> paste or did you come up with some specific workflow?
I used Claude AI project to attach requirement for the project. Then I just went with single conversation. I specified that I want to do it in small steps and then was just doing copy -> paste until I reached the limit. I think it was because I was doing one big convo instead attaching code to the project.
So pretty simple flow, totally not scalable for bigger projects.
I need to read and check Cursor AI which can also use Claude models.
I wish I could try out Cursor, but I cannot due to this bug: https://github.com/getcursor/cursor/issues/598
Have you tried a different IP address?
I have not, I am using my residential/home IP address though and I can access https://api2.cursor.sh/.
are you able to share the link to your prompts / conversation?
you can use the vscode cline to give a task and it uses a LLM to go out and create the app for you.
In django i had it create a backend, set admin user, create requirements.txt and then do a whole frontend in vue as a test. It even can do screen testing and tested what happens if it puts a wrong login in.
it'd be cool to see the prompts used and the edits required to get to the end product here.
I wish Claude let you share conversations more easily, I’d be curious to see how this one went and what follow on questions you had
huh? there should be a button on the top right to generate a share link in any conversation? is that really too hard?
its even documented on their site
https://support.anthropic.com/en/articles/9519189-project-vi...
/edit: i just checked. i think they had a regression? or at least i cannot see the button anymore. go figure. must be pretty recently, as i shared a chat just ~2-3 weeks agoNote the section you’re in at that doc link: “Claude for Work (Team & Enterprise Plans) -> Team & Enterprise Plan Features -> Project visibility and sharing”
I’ve had insanely, shockingly good experiences prototyping a musical web app using tone.js using Claude with copilot
Could this be used to RPA my browser? Is it safe?
What is RPA? Robotic Process Automation? If yes then I have no experience with that.
This sort of thing will be interesting to me once it can be done with fully local and open source tech on attainable hardware (and no, a $5,000 MacBook Pro is not attainable). Building a dependence on yet another untrustworthy AI startup that will inevitably enshittify isn’t compelling despite what the tech can do.
We’re getting there with some of the smaller open source models, but we’re not quite there yet. I’m looking forward to where we’ll be in a year!
> and no, a $5,000 MacBook Pro is not attainable
In many professions, $5000 for tools is almost nothing.
Yep. Typical landscape crew rolls with $50k in equipment (maybe more). People push back on tooling pricing in other industries (especially when the tooling is "soft') but have no clue what that the cost of doing biz is huge for others.
Yeah, but those tools don't get obsoleted in 3 years.
You're pretty lucky if the specialised tools for your profession cost <$2,000/y to replace and maintain. Sometimes tools last many years but cost an order of magnitude more anyways. Sometimes tools require expensive maintenance after purchase. Sometimes they are obsolete in a short number of years. Sometimes they were out quickly with use. Sometimes all of the above.
Regardless the reasons, any tooling in the ~$5,000/3 year ballpark is not at all a high or unique number for a profession.
I like opensource and reproducible methods too. but here, the code was written by claude and then exported. Is that considered a dependency? They can find a different LLM or pay someone to improve/revise/extend the code later if necessary
The nice thing is it doesn't really matter all too much which you use "today", you can take the same inputs to any and the outputs remain complete forever. If the concern is you'll start using these tools, like them, start using them a lot, then are worried suddenly all hosted options to run a query disappear tomorrow (meaning being able to run local is important to you) then Qwen2.5-Coder 32B with a 4 bit quant will run 30+ tokens/second will give you many years of use for <$1k in hardware.
If you want to pay that <$1k up front to just say "it was always just on my machine, nobody elses" then more power to you. Most just prefer this "pay as you go for someone else to have set it up" model. That doesn't imply it's unattainable if you want to run it differently though.
> (and no, a $5,000 MacBook Pro is not attainable)
I know we all love dunking on how expensive Apple computers are, but for $5000 you would be getting a Mac Mini maxed-out with an M4 Pro chip with 14‑core CPU, 20‑core GPU, 16-core Neural Engine, 64GB unified RAM memory, an 8TB SSD and 10 Gigabit Ethernet.
M4 MacBook Pros start at $1599.
I get where GP is coming from and it's not really related to typical Apple price bashing. You can list the most fantastical specs for the craziest value and it all really comes down to that single note: "64 GB memory for the GPU/NPU" - where the mini caps out. The GPU/NPU might change the speed of the output by a linear factor but the memory is a hard wall of how good a model you can run and 64 GB total is surprisingly not that high in the AI world. The MacBook Pro units referenced at $5k are the ones that support 128 GB, hence why they are popularly mentioned. ~ the same $ for the Mac Studio when you minimally load it up to 128 GB. Even then you're not able to run the biggest local models, 128 GB still isn't enough, but you can at least run the mid sized ones unquantized.
What I think GP was overlooking is newer mid range models like Qwen2.5-Coder 32B produce more than usable outputs for this kind of scenario on much lower end consumer (instead of prosumer) hardware so you don't need to go looking for the high memory stuff to do this kind of task locally, even if you may need the high memory stuff for serious AI workloads or AI training.