mirror of
https://github.com/github/awesome-copilot.git
synced 2026-02-20 02:15:12 +00:00
4.2 KiB
4.2 KiB
description, applyTo
| description | applyTo |
|---|---|
| Guidelines for building safe, governed AI agent systems. Apply when writing code that uses agent frameworks, tool-calling LLMs, or multi-agent orchestration to ensure proper safety boundaries, policy enforcement, and auditability. | ** |
Agent Safety & Governance
Core Principles
- Fail closed: If a governance check errors or is ambiguous, deny the action rather than allowing it
- Policy as configuration: Define governance rules in YAML/JSON files, not hardcoded in application logic
- Least privilege: Agents should have the minimum tool access needed for their task
- Append-only audit: Never modify or delete audit trail entries — immutability enables compliance
Tool Access Controls
- Always define an explicit allowlist of tools an agent can use — never give unrestricted tool access
- Separate tool registration from tool authorization — the framework knows what tools exist, the policy controls which are allowed
- Use blocklists for known-dangerous operations (shell execution, file deletion, database DDL)
- Require human-in-the-loop approval for high-impact tools (send email, deploy, delete records)
- Enforce rate limits on tool calls per request to prevent infinite loops and resource exhaustion
Content Safety
- Scan all user inputs for threat signals before passing to the agent (data exfiltration, prompt injection, privilege escalation)
- Filter agent arguments for sensitive patterns: API keys, credentials, PII, SQL injection
- Use regex pattern lists that can be updated without code changes
- Check both the user's original prompt AND the agent's generated tool arguments
Multi-Agent Safety
- Each agent in a multi-agent system should have its own governance policy
- When agents delegate to other agents, apply the most restrictive policy from either
- Track trust scores for agent delegates — degrade trust on failures, require ongoing good behavior
- Never allow an inner agent to have broader permissions than the outer agent that called it
Audit & Observability
- Log every tool call with: timestamp, agent ID, tool name, allow/deny decision, policy name
- Log every governance violation with the matched rule and evidence
- Export audit trails in JSON Lines format for integration with log aggregation systems
- Include session boundaries (start/end) in audit logs for correlation
Code Patterns
When writing agent tool functions:
# Good: Governed tool with explicit policy
@govern(policy)
async def search(query: str) -> str:
...
# Bad: Unprotected tool with no governance
async def search(query: str) -> str:
...
When defining policies:
# Good: Explicit allowlist, content filters, rate limit
name: my-agent
allowed_tools: [search, summarize]
blocked_patterns: ["(?i)(api_key|password)\\s*[:=]"]
max_calls_per_request: 25
# Bad: No restrictions
name: my-agent
allowed_tools: ["*"]
When composing multi-agent policies:
# Good: Most-restrictive-wins composition
final_policy = compose_policies(org_policy, team_policy, agent_policy)
# Bad: Only using agent-level policy, ignoring org constraints
final_policy = agent_policy
Framework-Specific Notes
- PydanticAI: Use
@agent.toolwith a governance decorator wrapper. PydanticAI's upcoming Traits feature is designed for this pattern. - CrewAI: Apply governance at the Crew level to cover all agents. Use
before_kickoffcallbacks for policy validation. - OpenAI Agents SDK: Wrap
@function_toolwith governance. Use handoff guards for multi-agent trust. - LangChain/LangGraph: Use
RunnableBindingor tool wrappers for governance. Apply at the graph edge level for flow control. - AutoGen: Implement governance in the
ConversableAgent.register_for_executionhook.
Common Mistakes
- Relying only on output guardrails (post-generation) instead of pre-execution governance
- Hardcoding policy rules instead of loading from configuration
- Allowing agents to self-modify their own governance policies
- Forgetting to governance-check tool arguments, not just tool names
- Not decaying trust scores over time — stale trust is dangerous
- Logging prompts in audit trails — log decisions and metadata, not user content