The lekt9/foundry case is the one that matters most structurally: no malicious code at audit time because the payload doesn't exist until the AI writes it during a conversation. Static analysis can't close that, and neither can AI audit — the attack surface is generative.
Two defenses the audit layer can't replace:
1. Pre-declared tool scopes: before a skill runs, what tool calls is it permitted to make? If the answer is "whatever the agent currently has access to," a clean audit on the SKILL.md doesn't actually constrain what gets executed.
2. Authorization enforcement independent of the agent: prompt injection hijacks the agent's reasoning — the agent becomes the threat model. The boundary that stops it can't live inside the agent.
The 7.5% malicious rate means you can't trust the ecosystem on average. The on-demand RCE-via-challenge and LLM-generated payload patterns show the attack can bypass static inspection entirely. AI-depth audit catches what shallow heuristics miss — it still doesn't constrain what an audited-and-deployed skill is allowed to reach.
The pairing that closes the loop: AI audit at deploy time + explicit permission grants at execution time the skill can't override. Audit determines trust level; authorization boundary enforces scope regardless.
Curious what the malicious distribution looks like by capability type — file vs. shell vs. network. That breakdown would tell you how much capability-scoping alone would have reduced the attack surface independent of the trust score.
The lekt9/foundry case is the one that matters most structurally: no malicious code at audit time because the payload doesn't exist until the AI writes it during a conversation. Static analysis can't close that, and neither can AI audit — the attack surface is generative.
Two defenses the audit layer can't replace:
1. Pre-declared tool scopes: before a skill runs, what tool calls is it permitted to make? If the answer is "whatever the agent currently has access to," a clean audit on the SKILL.md doesn't actually constrain what gets executed.
2. Authorization enforcement independent of the agent: prompt injection hijacks the agent's reasoning — the agent becomes the threat model. The boundary that stops it can't live inside the agent.
The 7.5% malicious rate means you can't trust the ecosystem on average. The on-demand RCE-via-challenge and LLM-generated payload patterns show the attack can bypass static inspection entirely. AI-depth audit catches what shallow heuristics miss — it still doesn't constrain what an audited-and-deployed skill is allowed to reach.
The pairing that closes the loop: AI audit at deploy time + explicit permission grants at execution time the skill can't override. Audit determines trust level; authorization boundary enforces scope regardless.
Curious what the malicious distribution looks like by capability type — file vs. shell vs. network. That breakdown would tell you how much capability-scoping alone would have reduced the attack surface independent of the trust score.