SwarmAI evaluates itself using its own agent orchestration capabilities. The Framework Value Score tracks quality across releases.
Feature comparison against major multi-agent frameworks
| Framework | Language | Governance | Budget Tracking | Multi-Tenant | YAML DSL | Built-in Tools |
|---|---|---|---|---|---|---|
| SwarmAI | Java | Yes | Yes | Yes | Yes | 74 |
| LangGraph | Python | No | No | No | No | 0 |
| CrewAI | Python | No | No | No | Yes | 10 |
| AutoGen | Python | No | No | No | No | 0 |
| Semantic Kernel | Java/.NET | No | No | No | No | 5 |
| Category | Scenario | Score | Status |
|---|---|---|---|
| CORE | Agent Builder Validation | 100 | PASS |
| CORE | Task Builder with Dependencies | 100 | PASS |
| CORE | Memory Store/Retrieve | 100 | PASS |
| CORE | ObservabilityContext Thread Propagation | 100 | PASS |
| CORE | Budget Tracking | 100 | PASS |
| CORE | Typed Exception Hierarchy | 100 | PASS |
| ENTERPRISE | Tenant Context Isolation | 100 | PASS |
| ENTERPRISE | SPI Default Implementations | 100 | PASS |
| ENTERPRISE | Governance Model | 100 | PASS |
| RESILIENCE | Circuit Breaker Initialization | 100 | PASS |
| RESILIENCE | Health Indicators | 100 | PASS |
| RESILIENCE | Configuration Validator | 100 | PASS |
| DSL | YAML Parser Available | 100 | PASS |
| DSL | Swarm Compiler Available | 100 | PASS |
| DSL | All 7 Process Types | 100 | PASS |
The Framework Value Score is computed by the swarmai-eval module, which runs 15 automated scenarios across 4 categories. Each scenario exercises a specific framework capability end-to-end and scores it on a 0-100 rubric.
Category weights: Core (25%), Orchestration (25%), Enterprise (20%), Resilience (15%), DSL (15%). A score ≥ 70 is required to pass the release gate. Regressions > 5 points trigger automatic alerts.
The eval swarm runs nightly on the main branch and on every release tag. Results are stored in eval-results/history.json and published here automatically.