CoreRAG_PIPELINE+80 ValueAddinventory_2 swarm-ai v1.0.10code Example code swarm-ai-examples/rag-knowledge-base-rest-apiopen_in_new

RAG Knowledge Base

compare_arrows Cited answer pinned to the corpus vs a confident summary that can't reproduce a single specific default value.

Prompt (both sides)
What are the recommended defaults for chunk size, top-K, and the synthesis prompt when building a RAG pipeline with `RagPipeline`, and which lessons from the IntelliDoc 2026-04-26 evaluation justify those defaults?
Model
SwarmAI

RAG Knowledge Base

SwarmAI workflow · RAG_PIPELINE

0tokens
$0.00 · localcost
0tools
0.0s/ 6.4s
Raw LLM

RAG Knowledge Base

Raw GPT-4o — no tools, no memory

0tokens
$0.00 · localcost
0.0s/ 9.8s
boltgpt-4o in 54 · out 0
Δ tokens (swarm − baseline)-493
Δ cost-$0.0000
Δ wall-time-3.4s
Asymmetry swarm succeeds · baseline cannot reach tools
insights What you're gaining

What you're looking at

Same question — "what defaults should I use for RagPipeline?" — answered with vs without retrieval. Left: SwarmAIRagPipeline ingests the project's RAG_LESSONS.md corpus and synthesises a cited answer with the eval-winning defaults baked in. Right: raw LLM — same model, same prompt, no documents, no retrieval.

On GPT-4o

Baseline

Generic advice

Admits it can't see the documents, falls back to general best-practices

  • "I don't have access to specific documents like the IntelliDoc 2026-04-26 evaluation"
  • Generic "use a chunk size around N", "consider hybrid retrieval"
  • No reference to this codebase's defaults at all
  • ~9.8s wall time, 0 citations

On GPT-5.4 mini

Baseline

Hallucinates a number

Invents "chunk size ≈ 500 tokens" — wrong, the eval picked 800

  • States "Chunk size: ~500 tokens" — fabricated, the project's actual default is 800
  • Recommends "grounded, citation-first prompt" — vague, no template
  • Confident tone, but every specific value is the model's prior, not the corpus
  • ~5.8s wall time, 0 citations

What changed under the hood

RagPipeline is the new high-level RAG facade in swarmai-core (1.0.10). It bakes in the configuration that won the IntelliDoc 2026-04-26 evaluation — 7 iterations × 225 document-grounded questions × 3 platforms (IntelliDoc, LangGraph-Python, LangChain4j-Java). The defaults that produced the best chunk-hit + faithfulness + latency tradeoff:

field default reason
chunk size 800 peer-aligned (500 too small, 1200 lost recall)
chunk overlap 100 enough to span formula/sentence boundaries
top-K 5 K=10 added prompt tokens without proportional recall
hybrid retrieval on BM25 + vector → RRF; disabling tanked chunk-hit 14 % → 8 %
MMR rerank off over-spreads results away from the right doc
temperature 0.2 0.1 over-refused, 0.3 paraphrased formulas
num predict 350 30 % latency saving without hurting completeness
synthesis prompt 5 plain-English bullets dropped refusals 36 % → 12 % vs 6-rule prompt

Builder is one line:

RagPipeline rag = RagPipeline.builder()
        .vectorStore(vectorStore)
        .chatClient(chatClient)
        .config(RagConfig.defaults())
        .build();

rag.ingestText("RAG_LESSONS.md", Files.readString(Paths.get("RAG_LESSONS.md")));
RagAnswer a = rag.query("What are the recommended defaults?");
// a.answer() → cited reply
// a.citations() → [Citation(RAG_LESSONS.md, 0, ...), ...]

Reproduce

cd swarm-ai-examples
./demo-recorder/record-rag-demo.sh gpt-4o gpt-5.4-mini
# requires OPENAI_API_KEY in .env, Ollama running with nomic-embed-text

Outputs land in demos/rag-knowledge-base/runs/<model>/<framework-version>/ and sync to the website under intelliswarm.ai/website/src/assets/demos/.

codeWorkflow YAML
# RagPipeline isn't a SwarmGraph workflow — it's the high-level facade in
# swarmai-core. The "workflow" here is just the builder call + ingest + query.

rag:
  pipeline: RagPipeline
  config:
    chunkSize: 800
    chunkOverlap: 100
    topK: 5
    maxPassageChars: 2400
    hybridRetrieval: true       # vector + BM25 → RRF
    contextualPrefix: true      # "[filename] " prepended before embedding
    mmrRerank: false            # diversity rerank disabled (over-spreads)
    temperature: 0.2
    numPredict: 350
  prompt: RagPrompts.SYSTEM     # five-bullet plain-English prompt (langchain-style)
  retriever: HybridRetriever    # vector + BM25 fused via RRF (K=60)
  splitter: RecursiveCharSplitter
  citationFormat: "[source: <filename> #<chunk-index>]"

verified Reproducible — model version, temperature, seed, framework git SHA, and hashes of prompt + workflow are embedded in every trace. Re-run to diff against this recording.