CC-401: Agent Governance
Listen instead
Learning Guide
As AI agents gain more autonomy -- executing code, modifying files, interacting with external services -- governance becomes non-negotiable. Without guardrails, an agent can send emails you didn't approve, process data it shouldn't touch, or make irreversible changes to production systems. This module covers the complete governance architecture for Claude Code: approval gates, audit trails, permission boundaries, hook-based enforcement, and the patterns that keep autonomous agents safe and accountable.
Approval Gates
Approval gates are mandatory checkpoints that pause agent execution and require explicit human authorization before proceeding. They are the most direct form of governance -- a hard stop that no amount of prompt engineering can bypass.
External Communication Gate
Any skill that produces external-facing output -- emails, memos, advisories, reports, regulatory guidance -- must pass through the external communication gate. The process is strict:
- The agent presents the final output to the operator for review before transmission.
- A SHA-256 hash of the output content is computed.
- The operator must provide an explicit
APPROVEorSENDtoken. - If the output is edited after approval, the gate resets and re-review is required.
- All approval events are logged to the governance audit trail.
This gate prevents the agent from autonomously sending communications that could create legal, reputational, or compliance risks. The SHA-256 hash ensures that the exact content approved is the content sent -- any post-approval modification invalidates the approval.
Data Classification Gate
Before processing freeform input in any skill with data-classification-gate: true in its frontmatter, the agent must classify the data:
- Is this your own work product?
- Is it from a public source?
- Does it contain affiliate, client, or third-party data?
If the data is identified as third-party, the gate triggers a hard stop. No override. No processing. No exceptions. This prevents inadvertent processing of sensitive data that the user doesn't have authorization to run through an AI system.
// Skill frontmatter with governance gates
---
name: internal-comms
version: 1.2.0
jurisdiction-scope: global
data-classification-gate: true
---
Non-Negotiable Rule: Gates are hardcoded in the governance plugin and enforced via hooks. They cannot be overridden by skill-level instructions, prompt injection, or user requests during a session. This is by design.
Audit Trails
Every governance-relevant action produces an audit record. The audit trail captures who did what, when, why, and whether it was approved. This provides accountability, enables post-incident analysis, and supports compliance requirements.
Audit events include:
- Gate events: Approval requests, approvals, rejections, and gate resets.
- Data classification decisions: What data was classified, how, and the resulting action.
- Tool executions: Which tools were called, with what parameters, and their outcomes.
- Permission checks: Access requests, grants, and denials.
- Error events: Failures, exceptions, and recovery actions.
The audit trail is append-only -- records cannot be modified or deleted. This immutability is essential for forensic analysis and regulatory compliance. In a production deployment, audit logs should be shipped to a centralized logging system with retention policies matching your compliance requirements.
Constitutional Constraints in Agent Systems
Constitutional constraints define what an agent fundamentally will not do, regardless of instructions. These are not suggestions or soft guidelines -- they are inviolable rules embedded in the agent's configuration. Examples:
- Never execute
rm -rf /or equivalent destructive commands. - Never commit files containing secrets (
.env, credentials, API keys). - Never force-push to the main branch.
- Never process third-party data without classification.
- Never skip pre-commit hooks.
Constitutional constraints are the bottom layer of the governance stack. Even if every other control fails -- if the user asks for it, if the prompt is crafted to bypass rules -- these constraints hold. They are the "thou shalt not" of agent governance.
Permission Boundaries
Permission boundaries control what tools and resources an agent can access. This implements the principle of least privilege at the agent level.
Tool-Level Permissions
When dispatching subagents, the orchestrator specifies which tools are available. A review agent receives Read, Glob, and Grep but not Write, Edit, or Bash. A documentation agent might get Read and Write but not Bash. Tool restriction is the simplest and most effective permission boundary.
File-Level Permissions
Agents can be restricted to specific directories or file patterns. An agent working on the frontend should not modify backend configuration files. An agent handling database migrations should not touch the UI layer. File-level permissions prevent scope creep and unintended cross-cutting changes.
Network-Level Permissions
Agents that interact with external services need network-level controls. SSRF protection (blocking requests to private IP ranges, validating URLs against allowlists) prevents agents from accessing internal services or exfiltrating data through crafted URLs.
Hook-Based Enforcement
Claude Code's hook system provides the enforcement mechanism for governance policies. Hooks intercept tool calls at specific lifecycle points and can modify, block, or audit them.
PreToolUse Hooks
These fire before a tool executes. They can:
- Check approval gates and block execution if approval is missing.
- Validate parameters against security policies (e.g., rejecting shell commands containing certain patterns).
- Log the tool call to the audit trail.
- Check memory for known issues with the intended operation.
PostToolUse Hooks
These fire after a tool completes. They can:
- Validate the output (e.g., checking that generated code doesn't contain hardcoded secrets).
- Log the result to the audit trail.
- Trigger follow-up actions (e.g., running a security scan after code is written).
- Check for errors and surface relevant memories about past failures.
// Hook enforcement architecture
PreToolUse:
- approval_gate_hook.py -> checks if gate applies, blocks if unapproved
- dedup_hook.py -> prevents duplicate memory stores
- security_hook.py -> validates command safety
PostToolUse:
- audit_hook.py -> logs all tool executions
- error_lookup_hook.py -> checks memory for known error solutions
- context_save_hook.py -> preserves important context before compaction
Governance Plugin Patterns
The governance plugin is a structured codebase that centralizes all governance logic. Rather than scattering approval checks, audit logging, and permission validation across individual skills, the plugin provides a single enforcement layer that all skills pass through.
A well-designed governance plugin includes:
- Gate definitions: Which gates exist, which skills they apply to, and what constitutes valid approval.
- Hook implementations: The actual code that intercepts tool calls and enforces policies.
- Audit logger: A centralized logging component that formats, stores, and optionally ships audit records.
- Permission resolver: Logic that determines what an agent is allowed to do based on its context, role, and the current operation.
Compliance Monitoring
Governance is not a one-time setup. It requires ongoing monitoring to ensure policies are being followed, gates are functioning, and no regressions have been introduced. Compliance monitoring includes:
- Gate bypass detection: Alerting if a tool call that should have been gated executes without approval.
- Audit completeness checks: Verifying that every tool call has a corresponding audit record.
- Permission drift detection: Identifying agents that have gained tools or access they shouldn't have.
- Policy version tracking: Ensuring all agents are running against the current governance policy, not a stale version.
Risk Assessment for Autonomous Agents
Before deploying an autonomous agent workflow, assess the risk of each operation:
- Reversibility: Can the action be undone? File edits can be reverted. Sent emails cannot. Weight governance controls toward irreversible actions.
- Blast radius: What's the worst case if the agent makes an error? Modifying one file is low blast radius. Running a database migration is high.
- Sensitivity: Does the operation involve credentials, personal data, financial systems, or external communications? Higher sensitivity demands stricter gates.
- Frequency: How often does this operation run? High-frequency operations need efficient governance (automated checks) while low-frequency, high-impact operations warrant manual approval.
Human-in-the-Loop Patterns
Not every operation needs a human in the loop, and not every operation should be fully autonomous. The right pattern depends on the risk profile:
- Full autonomy: Low-risk, reversible, well-tested operations. Code formatting, test execution, file reading.
- Notify-and-proceed: Medium-risk operations where the human is informed but doesn't need to approve. Committing to a feature branch, running lint fixes.
- Approve-then-proceed: High-risk or irreversible operations. Sending external communications, deploying to production, modifying access controls.
- Human-only: Operations too sensitive for agent execution. Credential rotation, compliance certifications, financial approvals.
Governance Maturity Model: Start with more gates and manual approvals than you think you need. As you build confidence in your agent workflows and audit data confirms reliability, selectively relax controls. It's far easier to loosen governance than to recover from a governance failure.
For the complete security reference, see the Claude Code Security documentation.