151 lines
4.2 KiB
Markdown
151 lines
4.2 KiB
Markdown
# Cykl życia AgentRun
|
|
|
|
#sympozium #agenty #lifecycle
|
|
|
|
## Pełny przepływ reconciliation
|
|
|
|
`AgentRunReconciler` to **największy i najważniejszy controller** w systemie (~900 linii kodu). Zarządza pełnym lifecycle od Pending do Completed.
|
|
|
|
## Faza: Pending → Running
|
|
|
|
```
|
|
reconcilePending():
|
|
│
|
|
├── 1. validatePolicy()
|
|
│ └── Sprawdza SympoziumPolicy:
|
|
│ - Sandbox requirements
|
|
│ - Tool gating
|
|
│ - Feature gates
|
|
│ - AgentSandbox policy
|
|
│
|
|
├── 2. Agent Sandbox check
|
|
│ └── Jeśli agentSandbox.enabled → reconcilePendingAgentSandbox()
|
|
│ (tworzy Sandbox CR zamiast Job)
|
|
│
|
|
├── 3. ensureAgentServiceAccount()
|
|
│ └── ServiceAccount "sympozium-agent" w target namespace
|
|
│
|
|
├── 4. createInputConfigMap()
|
|
│ └── ConfigMap z task, system prompt, memory context
|
|
│
|
|
├── 5. Lookup SympoziumInstance
|
|
│ ├── Memory enabled? → prepend memory instructions
|
|
│ ├── Observability config → inject OTel env vars
|
|
│ ├── Skills inheritance → copy from instance if empty
|
|
│ └── MCP servers → resolve URLs from MCPServer CRs
|
|
│
|
|
├── 6. ensureMCPConfigMap()
|
|
│ └── ConfigMap z konfiguracją MCP serwerów
|
|
│
|
|
├── 7. resolveSkillSidecars()
|
|
│ └── SkillPack CRDs → resolved sidecar specs
|
|
│
|
|
├── 8. Server mode check
|
|
│ └── mode=server → reconcilePendingServer() (Deployment+Service)
|
|
│
|
|
├── 9. Filter server-only sidecars (task mode)
|
|
│
|
|
├── 10. Memory server readiness check
|
|
│ └── Jeśli memory skill → sprawdź czy Deployment istnieje
|
|
│
|
|
├── 11. Build Job
|
|
│ ├── PodBuilder.BuildAgentContainer()
|
|
│ ├── PodBuilder.BuildIPCBridgeContainer()
|
|
│ ├── Skill sidecar containers
|
|
│ ├── MCP bridge sidecar (jeśli MCP servers)
|
|
│ ├── Sandbox sidecar (jeśli enabled)
|
|
│ ├── Memory volumes/init containers
|
|
│ ├── Secret mirroring (system → run namespace)
|
|
│ └── OTel tracing setup
|
|
│
|
|
├── 12. Create ephemeral RBAC
|
|
│ ├── Role + RoleBinding (namespace-scoped, ownerRef)
|
|
│ └── ClusterRole + ClusterRoleBinding (label-based)
|
|
│
|
|
├── 13. NetworkPolicy
|
|
│ └── deny-all + allow DNS + allow NATS
|
|
│
|
|
└── 14. Create Job → Status: Running
|
|
```
|
|
|
|
## Faza: Running
|
|
|
|
```
|
|
reconcileRunning():
|
|
│
|
|
├── Poll Job status (co 10s via requeue)
|
|
│
|
|
├── Pod Succeeded → extractResults():
|
|
│ ├── Read pod logs
|
|
│ ├── Extract result text
|
|
│ ├── Extract memory markers (__SYMPOZIUM_MEMORY__)
|
|
│ ├── Patch memory ConfigMap
|
|
│ ├── Extract token usage
|
|
│ └── Set status.result, completedAt, tokenUsage
|
|
│ → Status: Succeeded
|
|
│
|
|
├── Pod Failed →
|
|
│ ├── Read pod logs for error
|
|
│ ├── Set status.error, exitCode
|
|
│ └── Status: Failed
|
|
│
|
|
└── Timeout → failRun() → Status: Failed
|
|
```
|
|
|
|
## Faza: Succeeded/Failed
|
|
|
|
```
|
|
reconcileCompleted():
|
|
│
|
|
├── Clean up ephemeral RBAC
|
|
│ ├── Delete ClusterRole (label: agentrun=<name>)
|
|
│ └── Delete ClusterRoleBinding
|
|
│
|
|
├── Prune run history
|
|
│ └── Keep max 50 runs per instance (DefaultRunHistoryLimit)
|
|
│
|
|
└── Remove finalizer → AgentRun deletable
|
|
```
|
|
|
|
## Faza: Serving (server mode)
|
|
|
|
```
|
|
reconcileServing():
|
|
│
|
|
├── Sprawdź Deployment health
|
|
├── Sprawdź Service health
|
|
├── Reconcile HTTPRoute (Envoy Gateway)
|
|
└── Requeue co 30s
|
|
```
|
|
|
|
## Obsługa usunięcia
|
|
|
|
```
|
|
reconcileDelete():
|
|
│
|
|
├── Delete server-mode resources (Deployment, Service, HTTPRoute)
|
|
├── Delete ephemeral RBAC
|
|
├── Delete input ConfigMap
|
|
├── Delete MCP ConfigMap
|
|
├── Remove finalizer
|
|
└── AgentRun usunięty
|
|
```
|
|
|
|
## OTel Tracing
|
|
|
|
Każda faza reconciliation jest tracowana:
|
|
- `agentrun.reconcile` - główny span
|
|
- `agentrun.create_job` - tworzenie Job
|
|
- Traceparent propagowany do agent poda via env var
|
|
- TraceID zapisany w `status.traceID`
|
|
|
|
## Metryki
|
|
|
|
- `sympozium.agent.runs` - counter (success/failure labels)
|
|
- `sympozium.agent.duration_ms` - histogram czasu trwania
|
|
- `sympozium.errors` - counter błędów
|
|
|
|
---
|
|
|
|
Powiązane: [[AgentRun]] | [[Cykl życia Agent Pod]] | [[Orchestrator - PodBuilder i Spawner]]
|