mirror of https://github.com/github/awesome-copilot.git synced 2026-02-20 02:15:12 +00:00

Files

Imran Siddique 03290d78d0 fix: add applyTo field to agent-safety instructions frontmatter

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

2026-02-18 14:51:18 -08:00

4.2 KiB

Raw Blame History

description, applyTo

description	applyTo
Guidelines for building safe, governed AI agent systems. Apply when writing code that uses agent frameworks, tool-calling LLMs, or multi-agent orchestration to ensure proper safety boundaries, policy enforcement, and auditability.	**

Agent Safety & Governance

Core Principles

Fail closed: If a governance check errors or is ambiguous, deny the action rather than allowing it
Policy as configuration: Define governance rules in YAML/JSON files, not hardcoded in application logic
Least privilege: Agents should have the minimum tool access needed for their task
Append-only audit: Never modify or delete audit trail entries — immutability enables compliance

Tool Access Controls

Always define an explicit allowlist of tools an agent can use — never give unrestricted tool access
Separate tool registration from tool authorization — the framework knows what tools exist, the policy controls which are allowed
Use blocklists for known-dangerous operations (shell execution, file deletion, database DDL)
Require human-in-the-loop approval for high-impact tools (send email, deploy, delete records)
Enforce rate limits on tool calls per request to prevent infinite loops and resource exhaustion

Content Safety

Scan all user inputs for threat signals before passing to the agent (data exfiltration, prompt injection, privilege escalation)
Filter agent arguments for sensitive patterns: API keys, credentials, PII, SQL injection
Use regex pattern lists that can be updated without code changes
Check both the user's original prompt AND the agent's generated tool arguments

Multi-Agent Safety

Each agent in a multi-agent system should have its own governance policy
When agents delegate to other agents, apply the most restrictive policy from either
Track trust scores for agent delegates — degrade trust on failures, require ongoing good behavior
Never allow an inner agent to have broader permissions than the outer agent that called it

Audit & Observability

Log every tool call with: timestamp, agent ID, tool name, allow/deny decision, policy name
Log every governance violation with the matched rule and evidence
Export audit trails in JSON Lines format for integration with log aggregation systems
Include session boundaries (start/end) in audit logs for correlation

Code Patterns

When writing agent tool functions:

# Good: Governed tool with explicit policy
@govern(policy)
async def search(query: str) -> str:
    ...

# Bad: Unprotected tool with no governance
async def search(query: str) -> str:
    ...

When defining policies:

# Good: Explicit allowlist, content filters, rate limit
name: my-agent
allowed_tools: [search, summarize]
blocked_patterns: ["(?i)(api_key|password)\\s*[:=]"]
max_calls_per_request: 25

# Bad: No restrictions
name: my-agent
allowed_tools: ["*"]

When composing multi-agent policies:

# Good: Most-restrictive-wins composition
final_policy = compose_policies(org_policy, team_policy, agent_policy)

# Bad: Only using agent-level policy, ignoring org constraints
final_policy = agent_policy

Framework-Specific Notes

PydanticAI: Use @agent.tool with a governance decorator wrapper. PydanticAI's upcoming Traits feature is designed for this pattern.
CrewAI: Apply governance at the Crew level to cover all agents. Use before_kickoff callbacks for policy validation.
OpenAI Agents SDK: Wrap @function_tool with governance. Use handoff guards for multi-agent trust.
LangChain/LangGraph: Use RunnableBinding or tool wrappers for governance. Apply at the graph edge level for flow control.
AutoGen: Implement governance in the ConversableAgent.register_for_execution hook.

Common Mistakes

Relying only on output guardrails (post-generation) instead of pre-execution governance
Hardcoding policy rules instead of loading from configuration
Allowing agents to self-modify their own governance policies
Forgetting to governance-check tool arguments, not just tool names
Not decaying trust scores over time — stale trust is dangerous
Logging prompts in audit trails — log decisions and metadata, not user content

4.2 KiB Raw Blame History