Show HN: CLI to score AI prompts after a prod failure

(costguardai.io)

1 points | by techcam 9 hours ago ago

1 comments

  • techcam 9 hours ago

    Happy to explain how the scoring works since that’s the obvious first question.

    The core idea is:

    Safety Score = 100 − riskScore

    The risk score is based on structural prompt properties that tend to correlate with failures in production systems:

    - instruction hierarchy ambiguity - conflicting directives (system vs user) - missing output constraints - unconstrained response scope - token cost / context pressure

    Each factor contributes a weighted amount to the total risk score.

    It’s not trying to predict exact model behavior — that’s not possible statically.

    The goal is closer to a linter: flagging prompt structures that are more likely to break (injection, hallucination drift, ignored constraints, etc).

    There’s also a lightweight pattern registry. If a prompt matches structural patterns seen in real jailbreak/injection cases (e.g. authority ambiguity), the score increases.

    One thing that surprised me while building it: instruction hierarchy ambiguity caused more real-world failures than obvious injection patterns.

    The CLI runs locally — no prompts are sent anywhere.

    If you want to try it:

    npm install -g @camj78/costguardai costguardai analyze your-prompt.txt

    Curious what failure modes others here have seen in production prompts.