Agentic AI Code Review: From Confidently Wrong to Evidence-Based

(platformtoolsmith.com)

2 points | by sharp-dev 5 hours ago ago

1 comments

  • sharp-dev 5 hours ago

    TL;DR: AI code reviewer went from "confidently wrong" to actually useful. Fix: stopped pre-selecting context, gave the model tools to fetch evidence itself. Now it either cites file:line or stays quiet.

    The Problem

    Our AI reviewer flagged a "blocker." Cited the diff, built a plausible argument, suggested a fix. The senior engineer spent 20 minutes disproving it. Did the guard clause get missed? Two files away. The model never had that file, so it guessed and sounded certain. Pre-selecting context doesn't work. Code review follows evidence chains, and chains aren't predictable.

    The Fix

    Agentic loop:

    Model: "This calls validate()" → search_code("validate") Model: "Two call sites use withRetry(). Third doesn't." → get_file_content("config/defaults.go") Model: "Missing timeout. Bug found." → submit_code_review(structured_output) Model fetches what it needs. Loop ends when it submits structured findings (path, line, severity, evidence—not prose).

    What Changed

    - Before: "This might break retries." - After: "In foo/bar.go:123, call bypasses withRetry(). Other call sites use it (see search results). Wrap or document."

    The Pieces

    1. Tools — boring, fast, deterministic. get_file_content, search_code. Treat them like production APIs. 2. Terminal action — structured JSON submission, not Markdown. No evidence? Can't submit. 3. Loop — model turn → tool turn → repeat. Aggressive context shrinking (old results truncated, diff stays). 4. Guardrails — iteration caps, timeouts, self-critique checklist.

    Evaluation

    Pick 5-10 PRs where you know the real risks. Check: - Found the issue? - Cited exact file:line? - Hallucinated anything? - Fetched evidence when uncertain? Iterate on tools, not prompts.

    The Pattern

    Don't build bigger prompts. Build a loop where the model can fetch evidence, test hypotheses, and submit only when it can cite sources. That's the difference between "sounds right" and "is right."