initial commit
This commit is contained in:
150
04-Zarządzanie-Agentami/Cykl życia AgentRun.md
Normal file
150
04-Zarządzanie-Agentami/Cykl życia AgentRun.md
Normal file
@@ -0,0 +1,150 @@
|
||||
# Cykl życia AgentRun
|
||||
|
||||
#sympozium #agenty #lifecycle
|
||||
|
||||
## Pełny przepływ reconciliation
|
||||
|
||||
`AgentRunReconciler` to **największy i najważniejszy controller** w systemie (~900 linii kodu). Zarządza pełnym lifecycle od Pending do Completed.
|
||||
|
||||
## Faza: Pending → Running
|
||||
|
||||
```
|
||||
reconcilePending():
|
||||
│
|
||||
├── 1. validatePolicy()
|
||||
│ └── Sprawdza SympoziumPolicy:
|
||||
│ - Sandbox requirements
|
||||
│ - Tool gating
|
||||
│ - Feature gates
|
||||
│ - AgentSandbox policy
|
||||
│
|
||||
├── 2. Agent Sandbox check
|
||||
│ └── Jeśli agentSandbox.enabled → reconcilePendingAgentSandbox()
|
||||
│ (tworzy Sandbox CR zamiast Job)
|
||||
│
|
||||
├── 3. ensureAgentServiceAccount()
|
||||
│ └── ServiceAccount "sympozium-agent" w target namespace
|
||||
│
|
||||
├── 4. createInputConfigMap()
|
||||
│ └── ConfigMap z task, system prompt, memory context
|
||||
│
|
||||
├── 5. Lookup SympoziumInstance
|
||||
│ ├── Memory enabled? → prepend memory instructions
|
||||
│ ├── Observability config → inject OTel env vars
|
||||
│ ├── Skills inheritance → copy from instance if empty
|
||||
│ └── MCP servers → resolve URLs from MCPServer CRs
|
||||
│
|
||||
├── 6. ensureMCPConfigMap()
|
||||
│ └── ConfigMap z konfiguracją MCP serwerów
|
||||
│
|
||||
├── 7. resolveSkillSidecars()
|
||||
│ └── SkillPack CRDs → resolved sidecar specs
|
||||
│
|
||||
├── 8. Server mode check
|
||||
│ └── mode=server → reconcilePendingServer() (Deployment+Service)
|
||||
│
|
||||
├── 9. Filter server-only sidecars (task mode)
|
||||
│
|
||||
├── 10. Memory server readiness check
|
||||
│ └── Jeśli memory skill → sprawdź czy Deployment istnieje
|
||||
│
|
||||
├── 11. Build Job
|
||||
│ ├── PodBuilder.BuildAgentContainer()
|
||||
│ ├── PodBuilder.BuildIPCBridgeContainer()
|
||||
│ ├── Skill sidecar containers
|
||||
│ ├── MCP bridge sidecar (jeśli MCP servers)
|
||||
│ ├── Sandbox sidecar (jeśli enabled)
|
||||
│ ├── Memory volumes/init containers
|
||||
│ ├── Secret mirroring (system → run namespace)
|
||||
│ └── OTel tracing setup
|
||||
│
|
||||
├── 12. Create ephemeral RBAC
|
||||
│ ├── Role + RoleBinding (namespace-scoped, ownerRef)
|
||||
│ └── ClusterRole + ClusterRoleBinding (label-based)
|
||||
│
|
||||
├── 13. NetworkPolicy
|
||||
│ └── deny-all + allow DNS + allow NATS
|
||||
│
|
||||
└── 14. Create Job → Status: Running
|
||||
```
|
||||
|
||||
## Faza: Running
|
||||
|
||||
```
|
||||
reconcileRunning():
|
||||
│
|
||||
├── Poll Job status (co 10s via requeue)
|
||||
│
|
||||
├── Pod Succeeded → extractResults():
|
||||
│ ├── Read pod logs
|
||||
│ ├── Extract result text
|
||||
│ ├── Extract memory markers (__SYMPOZIUM_MEMORY__)
|
||||
│ ├── Patch memory ConfigMap
|
||||
│ ├── Extract token usage
|
||||
│ └── Set status.result, completedAt, tokenUsage
|
||||
│ → Status: Succeeded
|
||||
│
|
||||
├── Pod Failed →
|
||||
│ ├── Read pod logs for error
|
||||
│ ├── Set status.error, exitCode
|
||||
│ └── Status: Failed
|
||||
│
|
||||
└── Timeout → failRun() → Status: Failed
|
||||
```
|
||||
|
||||
## Faza: Succeeded/Failed
|
||||
|
||||
```
|
||||
reconcileCompleted():
|
||||
│
|
||||
├── Clean up ephemeral RBAC
|
||||
│ ├── Delete ClusterRole (label: agentrun=<name>)
|
||||
│ └── Delete ClusterRoleBinding
|
||||
│
|
||||
├── Prune run history
|
||||
│ └── Keep max 50 runs per instance (DefaultRunHistoryLimit)
|
||||
│
|
||||
└── Remove finalizer → AgentRun deletable
|
||||
```
|
||||
|
||||
## Faza: Serving (server mode)
|
||||
|
||||
```
|
||||
reconcileServing():
|
||||
│
|
||||
├── Sprawdź Deployment health
|
||||
├── Sprawdź Service health
|
||||
├── Reconcile HTTPRoute (Envoy Gateway)
|
||||
└── Requeue co 30s
|
||||
```
|
||||
|
||||
## Obsługa usunięcia
|
||||
|
||||
```
|
||||
reconcileDelete():
|
||||
│
|
||||
├── Delete server-mode resources (Deployment, Service, HTTPRoute)
|
||||
├── Delete ephemeral RBAC
|
||||
├── Delete input ConfigMap
|
||||
├── Delete MCP ConfigMap
|
||||
├── Remove finalizer
|
||||
└── AgentRun usunięty
|
||||
```
|
||||
|
||||
## OTel Tracing
|
||||
|
||||
Każda faza reconciliation jest tracowana:
|
||||
- `agentrun.reconcile` - główny span
|
||||
- `agentrun.create_job` - tworzenie Job
|
||||
- Traceparent propagowany do agent poda via env var
|
||||
- TraceID zapisany w `status.traceID`
|
||||
|
||||
## Metryki
|
||||
|
||||
- `sympozium.agent.runs` - counter (success/failure labels)
|
||||
- `sympozium.agent.duration_ms` - histogram czasu trwania
|
||||
- `sympozium.errors` - counter błędów
|
||||
|
||||
---
|
||||
|
||||
Powiązane: [[AgentRun]] | [[Cykl życia Agent Pod]] | [[Orchestrator - PodBuilder i Spawner]]
|
||||
Reference in New Issue
Block a user