It’s very cool to have these self improving loops! In my case, I’m working on an iOS app and I’ve had a lot of struggle in the lacking structures around automating tests, and being able to run in the sandbox that way that would’ve had. Is this also the state of the art for my experience if you can’t sandbox?
Out of the box, Opus/GPT-5.4 uses checks like `tsc --noEmit` or `python -m py_compile ...`, so compile checks on a Swift/Objective-C app probably get you pretty far. Also just setting up a Dockerfile maybe with swift:5.10-focal would give the agent the right tools to verify its own work.
Sandboxing isn't about having the perfect devex environment, most dev sandboxes (looking at codex cloud) don't actually have full verification available. Sometimes Github actions CICD is all you can get!
It’s very cool to have these self improving loops! In my case, I’m working on an iOS app and I’ve had a lot of struggle in the lacking structures around automating tests, and being able to run in the sandbox that way that would’ve had. Is this also the state of the art for my experience if you can’t sandbox?
Out of the box, Opus/GPT-5.4 uses checks like `tsc --noEmit` or `python -m py_compile ...`, so compile checks on a Swift/Objective-C app probably get you pretty far. Also just setting up a Dockerfile maybe with swift:5.10-focal would give the agent the right tools to verify its own work.
Sandboxing isn't about having the perfect devex environment, most dev sandboxes (looking at codex cloud) don't actually have full verification available. Sometimes Github actions CICD is all you can get!