# Cykl życia AgentRun #sympozium #agenty #lifecycle ## Pełny przepływ reconciliation `AgentRunReconciler` to **największy i najważniejszy controller** w systemie (~900 linii kodu). Zarządza pełnym lifecycle od Pending do Completed. ## Faza: Pending → Running ``` reconcilePending(): │ ├── 1. validatePolicy() │ └── Sprawdza SympoziumPolicy: │ - Sandbox requirements │ - Tool gating │ - Feature gates │ - AgentSandbox policy │ ├── 2. Agent Sandbox check │ └── Jeśli agentSandbox.enabled → reconcilePendingAgentSandbox() │ (tworzy Sandbox CR zamiast Job) │ ├── 3. ensureAgentServiceAccount() │ └── ServiceAccount "sympozium-agent" w target namespace │ ├── 4. createInputConfigMap() │ └── ConfigMap z task, system prompt, memory context │ ├── 5. Lookup SympoziumInstance │ ├── Memory enabled? → prepend memory instructions │ ├── Observability config → inject OTel env vars │ ├── Skills inheritance → copy from instance if empty │ └── MCP servers → resolve URLs from MCPServer CRs │ ├── 6. ensureMCPConfigMap() │ └── ConfigMap z konfiguracją MCP serwerów │ ├── 7. resolveSkillSidecars() │ └── SkillPack CRDs → resolved sidecar specs │ ├── 8. Server mode check │ └── mode=server → reconcilePendingServer() (Deployment+Service) │ ├── 9. Filter server-only sidecars (task mode) │ ├── 10. Memory server readiness check │ └── Jeśli memory skill → sprawdź czy Deployment istnieje │ ├── 11. Build Job │ ├── PodBuilder.BuildAgentContainer() │ ├── PodBuilder.BuildIPCBridgeContainer() │ ├── Skill sidecar containers │ ├── MCP bridge sidecar (jeśli MCP servers) │ ├── Sandbox sidecar (jeśli enabled) │ ├── Memory volumes/init containers │ ├── Secret mirroring (system → run namespace) │ └── OTel tracing setup │ ├── 12. Create ephemeral RBAC │ ├── Role + RoleBinding (namespace-scoped, ownerRef) │ └── ClusterRole + ClusterRoleBinding (label-based) │ ├── 13. NetworkPolicy │ └── deny-all + allow DNS + allow NATS │ └── 14. Create Job → Status: Running ``` ## Faza: Running ``` reconcileRunning(): │ ├── Poll Job status (co 10s via requeue) │ ├── Pod Succeeded → extractResults(): │ ├── Read pod logs │ ├── Extract result text │ ├── Extract memory markers (__SYMPOZIUM_MEMORY__) │ ├── Patch memory ConfigMap │ ├── Extract token usage │ └── Set status.result, completedAt, tokenUsage │ → Status: Succeeded │ ├── Pod Failed → │ ├── Read pod logs for error │ ├── Set status.error, exitCode │ └── Status: Failed │ └── Timeout → failRun() → Status: Failed ``` ## Faza: Succeeded/Failed ``` reconcileCompleted(): │ ├── Clean up ephemeral RBAC │ ├── Delete ClusterRole (label: agentrun=) │ └── Delete ClusterRoleBinding │ ├── Prune run history │ └── Keep max 50 runs per instance (DefaultRunHistoryLimit) │ └── Remove finalizer → AgentRun deletable ``` ## Faza: Serving (server mode) ``` reconcileServing(): │ ├── Sprawdź Deployment health ├── Sprawdź Service health ├── Reconcile HTTPRoute (Envoy Gateway) └── Requeue co 30s ``` ## Obsługa usunięcia ``` reconcileDelete(): │ ├── Delete server-mode resources (Deployment, Service, HTTPRoute) ├── Delete ephemeral RBAC ├── Delete input ConfigMap ├── Delete MCP ConfigMap ├── Remove finalizer └── AgentRun usunięty ``` ## OTel Tracing Każda faza reconciliation jest tracowana: - `agentrun.reconcile` - główny span - `agentrun.create_job` - tworzenie Job - Traceparent propagowany do agent poda via env var - TraceID zapisany w `status.traceID` ## Metryki - `sympozium.agent.runs` - counter (success/failure labels) - `sympozium.agent.duration_ms` - histogram czasu trwania - `sympozium.errors` - counter błędów --- Powiązane: [[AgentRun]] | [[Cykl życia Agent Pod]] | [[Orchestrator - PodBuilder i Spawner]]