4.2 KiB
4.2 KiB
Cykl życia AgentRun
#sympozium #agenty #lifecycle
Pełny przepływ reconciliation
AgentRunReconciler to największy i najważniejszy controller w systemie (~900 linii kodu). Zarządza pełnym lifecycle od Pending do Completed.
Faza: Pending → Running
reconcilePending():
│
├── 1. validatePolicy()
│ └── Sprawdza SympoziumPolicy:
│ - Sandbox requirements
│ - Tool gating
│ - Feature gates
│ - AgentSandbox policy
│
├── 2. Agent Sandbox check
│ └── Jeśli agentSandbox.enabled → reconcilePendingAgentSandbox()
│ (tworzy Sandbox CR zamiast Job)
│
├── 3. ensureAgentServiceAccount()
│ └── ServiceAccount "sympozium-agent" w target namespace
│
├── 4. createInputConfigMap()
│ └── ConfigMap z task, system prompt, memory context
│
├── 5. Lookup SympoziumInstance
│ ├── Memory enabled? → prepend memory instructions
│ ├── Observability config → inject OTel env vars
│ ├── Skills inheritance → copy from instance if empty
│ └── MCP servers → resolve URLs from MCPServer CRs
│
├── 6. ensureMCPConfigMap()
│ └── ConfigMap z konfiguracją MCP serwerów
│
├── 7. resolveSkillSidecars()
│ └── SkillPack CRDs → resolved sidecar specs
│
├── 8. Server mode check
│ └── mode=server → reconcilePendingServer() (Deployment+Service)
│
├── 9. Filter server-only sidecars (task mode)
│
├── 10. Memory server readiness check
│ └── Jeśli memory skill → sprawdź czy Deployment istnieje
│
├── 11. Build Job
│ ├── PodBuilder.BuildAgentContainer()
│ ├── PodBuilder.BuildIPCBridgeContainer()
│ ├── Skill sidecar containers
│ ├── MCP bridge sidecar (jeśli MCP servers)
│ ├── Sandbox sidecar (jeśli enabled)
│ ├── Memory volumes/init containers
│ ├── Secret mirroring (system → run namespace)
│ └── OTel tracing setup
│
├── 12. Create ephemeral RBAC
│ ├── Role + RoleBinding (namespace-scoped, ownerRef)
│ └── ClusterRole + ClusterRoleBinding (label-based)
│
├── 13. NetworkPolicy
│ └── deny-all + allow DNS + allow NATS
│
└── 14. Create Job → Status: Running
Faza: Running
reconcileRunning():
│
├── Poll Job status (co 10s via requeue)
│
├── Pod Succeeded → extractResults():
│ ├── Read pod logs
│ ├── Extract result text
│ ├── Extract memory markers (__SYMPOZIUM_MEMORY__)
│ ├── Patch memory ConfigMap
│ ├── Extract token usage
│ └── Set status.result, completedAt, tokenUsage
│ → Status: Succeeded
│
├── Pod Failed →
│ ├── Read pod logs for error
│ ├── Set status.error, exitCode
│ └── Status: Failed
│
└── Timeout → failRun() → Status: Failed
Faza: Succeeded/Failed
reconcileCompleted():
│
├── Clean up ephemeral RBAC
│ ├── Delete ClusterRole (label: agentrun=<name>)
│ └── Delete ClusterRoleBinding
│
├── Prune run history
│ └── Keep max 50 runs per instance (DefaultRunHistoryLimit)
│
└── Remove finalizer → AgentRun deletable
Faza: Serving (server mode)
reconcileServing():
│
├── Sprawdź Deployment health
├── Sprawdź Service health
├── Reconcile HTTPRoute (Envoy Gateway)
└── Requeue co 30s
Obsługa usunięcia
reconcileDelete():
│
├── Delete server-mode resources (Deployment, Service, HTTPRoute)
├── Delete ephemeral RBAC
├── Delete input ConfigMap
├── Delete MCP ConfigMap
├── Remove finalizer
└── AgentRun usunięty
OTel Tracing
Każda faza reconciliation jest tracowana:
agentrun.reconcile- główny spanagentrun.create_job- tworzenie Job- Traceparent propagowany do agent poda via env var
- TraceID zapisany w
status.traceID
Metryki
sympozium.agent.runs- counter (success/failure labels)sympozium.agent.duration_ms- histogram czasu trwaniasympozium.errors- counter błędów
Powiązane: AgentRun | Cykl życia Agent Pod | Orchestrator - PodBuilder i Spawner