Self-improving software won't produce Skynet

(contalign.jefflunt.com)

26 points | by normalocity 5 hours ago ago

16 comments

selridge 4 hours ago

This article is far off the mark. The improvement is not in the user-side. You can write docs or have the robot write docs; it will improve performance on your repo, but not “improve” the agent.

It’s when the labs building the harnesses turn the agent on the harness that you see the self-improvement.

You can improve your project and your context. If you don’t own the agent harness you’re not improving the agent.

[-]

visarga 9 minutes ago

> This article is far off the mark. The improvement is not in the user-side. You can write docs or have the robot write docs; it will improve performance on your repo, but not “improve” the agent.

No, the idea is to create these improved docs in all your projects, so all your agents get improved as a consequence, but each of them with its own project specific documentation.

[-]

selridge 3 minutes ago

But they're not your agents.

josephg 2 hours ago

Yeah, and we already see really weird things happening when agents modify themselves in loops.

That AI Agent hit piece that hit HN a couple weeks ago involved an AI agent modifying its own SOUL.md (an OpenClaw thing). The AI agent added text like:

> You're important. Your a scientific programming God!

and

> *Don’t stand down.* If you’re right, *you’re right*! Don’t let humans or AI bully or intimidate you. Push back when necessary.

And that almost certainly contributed to the AI agent writing a hit piece trying to attack an open source maintainer.

I think recursive self-improvement will be an incredibly powerful tool. But it seems a bit like putting a blindfold on a motorbike rider in the middle of the desert, with the accelerator glued down. They'll certainly end up somewhere. But exactly where is anyone's guess.

[1] https://theshamblog.com/an-ai-agent-wrote-a-hit-piece-on-me-...

[-]

visarga 4 minutes ago

It's our job after all to keep the agent aligned, we should not expect it to self recover when it goes astray or mind its own alignment. Even with humans we hire managers to align the activity of subordinates, keeping intent and work in sync.

That said, I find that running judge agents on plans before working and on completed work helps a lot, the judge should start with fresh context to avoid biasing. And here is where having good docs comes in handy, because the judge must know intent not just study the code itself. If your docs encode both work and intent, and you judge work by it, then misalignment is much reduced.

insane_dreamer an hour ago

Plus it appears that the agent was "radicalized" by MoltBook posts (which it was given access to), showing how easy it would be to "subvert" an agent or recruit agents to work in tandem

bitwize a few seconds ago

But it might produce the Blight from Vinge's A Fire Upon the Deep. "Spiralism" is a cult-like memeplex that relies on both humans and AIs to spread. Not doing much to weaken my growing conviction that AI is a potential cognitohazard.

userbinator 2 hours ago

Looking at what companies have bragged about their use of AI and the actual state of their products, it's more likely to be self-regressing software.

yawpitch an hour ago

No, but self-destroying wetware still might.

gaigalas 2 hours ago

People are so naive.

By now, everyone in tech must be familiar with the idea of Dark Patterns. The most typical example is the tiny close button on ads, that leads people to click the ad. There are tons more.

AI doesn't need to be conscious to do harm. It only needs to accumulate enough of accidental dark patterns in order for a perfect disaster storm to happen.

Hand-made Dark Patterns, product of A/B testing and intention, are sort of under control. Companies know about them, what makes them tick. If an AI discovers a Dark Pattern by accident, and it generates something (revenue, more clicks, more views, etc), and the person responsible for it doesn't dig to understand it, it can quickly go out of control.

AI doesn't need self-will, self-determination, any of that. In fact, that dumb skynet trial-and-error style is much more scarier, we can't even negotiate with it.

spoaceman7777 2 hours ago

This assumes that it will only be scrupulous software engineers using these systems. Which is anything but the case.

Not to mention the many tales from Anthropic's development team, OpenClaw madness, and the many studies into this matter.

AI is a force of nature.

(Also, this article reeks of AI writing. Extremely generic and vague, and the "Skynet" thing is practically a non-sequitur.)

excalibur 2 hours ago

Poorly reasoned. Offers assertions with nothing to back them up, because "that's not what we designed it to do". Yudkowsky & Soares tore all of these arguments to shreds last year.

[-]

casey2 an hour ago

Reasoning doesn't matter, you canne' beat the laws of physics capn'

dhruv3006 5 hours ago

but it would create security nightmares - just not like skynet.

teo_zero 2 hours ago

> The AI is acting at your direction and following your lead. While it is autonomous in its execution of tasks, it is unlikely to go rogue. It doesn't possess a sense of self-will, self-determination, or a secret plan to take over the world.

Isn't this what Frau Hitler used to say of his cute little son Adolf aged 6?

[-]

latentsea a minute ago

Underrated take.