2 comments

  • burnerToBetOut a month ago

    The answer, my friendow, is blowing in the context window [1]…

    _______

    LLMs already "know" these books deeply, but without a structured prompt scaffold they apply that knowledge inconsistently and at low confidence. Giving the model a explicit lens — "review this as if you're checking against Clean Code heuristics C1–C36" — concentrates attention and dramatically reduces hallucinated or off-topic feedback

    Where I'd push back or warn you:

    Context collapse is your #1 enemy. Clean Code was written for Java in 2008. DDIA is about distributed systems at scale. If you apply the Clean Code reviewer to a 50-line Python script, you'll get pedantic nonsense about function length when the actual problem might be that the data model is wrong. Your skill selection logic needs to be domain-aware, not just "throw all skills at every file"

    _______

    [1] https://g2ww.short.gy/ZLStasQ1

  • YaraDori a month ago

    Cool idea — “book as rubric” is a nice way to avoid vague self-critiques.

    One thing I’ve found helps keep agent review loops from getting shallow is to separate levels of critique: 1) a fast “lint” pass (format, obvious bugs, missing tests) 2) a domain pass that’s forced to cite specific passages/rules from the rubric 3) a “counterexample” pass: reviewer must propose at least 1 concrete failing scenario + how to reproduce

    Question: are you capturing the reviewer’s evidence (links, excerpts, failing cases) in a structured log so a human can audit why the agent changed something?

    Related: I’m working on SkillForge (https://skillforge.expert) — we’ve been thinking about similar auditability, but for UI workflows: record once, then replay with checkpoints + retries, so humans approve at meaningful boundaries instead of every click.