Chat with your docs. For real.
Ingestion pipeline that keeps up with your content. Vector DB tuned for retrieval precision. Chat UI wired to your auth, your data, your model. Three weeks. $6,999.
RAG Build — a 3-week engagement that audits your content sources (Notion, Confluence, Google Drive, SharePoint, Zendesk, PDFs, custom), ships an ingestion pipeline that keeps up with your changes (nightly batch by default; realtime on specific sources when justified), tunes retrieval with the right vector DB (Pinecone, Chroma, Weaviate, or pgvector), adds a reranker layer (Cohere or cross-encoder) to push precision above 0.85, and ships either a React chat UI with citations or a REST / SSE API your team wraps. Closes with an eval suite of 20–50 golden questions, observability wired, and a runbook for adding sources. $6,999 fixed, three weeks, yours to run.
Five RAG patterns we ship most.
Most RAG builds fit one of these five shapes. The scoping call maps your specific content to the nearest shape and confirms stack choices.
| Situation | Today | What the build ships |
|---|---|---|
| Internal docs sprawl — 4k Notion pages + 2k PDFs no one reads | Full-text search is bad; employees ask in Slack for facts that are documented | RAG with source citations; chat UI wired to your SSO so access is audited |
| Support KB used only by agents, not customers | Customers can't find answers; deflection rate stalls at 20% | Customer-facing RAG chatbot with guardrails, citations, and human-escalation path |
| Compliance / policy / legal archive no one can search meaningfully | Questions answered by 'ask legal' — bottlenecks, slow decisions | RAG with strict source citations + confidence threshold; legal approves Q&A patterns |
| Product documentation fragmented across 5 systems | Engineers ask the same 20 questions in Slack daily | RAG indexing every source nightly; answers pulled live from current docs, not stale copy |
| Prototype built on LangChain `RetrievalQA` that 'sort of works' | No reranker, no eval, hallucinations hidden by confident tone | Reranker added, eval suite shipped with precision/recall numbers, hallucinations quantified and reduced |
The 3-week RAG-build schedule.
Week 1 ends with a working ingestion hitting one source end-to-end. Week 2 ends with retrieval + reranker tuned against golden questions. Week 3 ends with your chat UI or API live, eval suite running, handoff complete.
- W1week 1
Content audit + ingestion prototype
Week 1 is the content audit — we inventory every source (Notion, Confluence, Google Drive, SharePoint, S3, custom), assess structure (chunkable? already split? markdown vs. PDF?), identify PII risk, and ship an ingestion prototype hitting one representative source end-to-end. You see real embeddings in a test vector DB by end of week 1.
- W2week 2
Retrieval tuning + reranker
Week 2 tunes retrieval — chunk size and overlap tuned against real queries, embedding model selected (OpenAI ada-3, Cohere, Voyage, or open-source), metadata filters added (source, recency, permission), and a reranker layer (Cohere rerank-3 or cross-encoder) bolts on to push precision above 0.85 on the top-5. You see the precision/recall numbers on real golden Q&A pairs by end of week 2.
- W3week 3
UI / API + evals + handoff
Week 3 ships the chat UI (React component with streaming, citations, and confidence) or the API (REST or SSE endpoint your team wraps themselves), hardens the eval suite (20–50 golden questions we run every time the ingestion changes), wires cost and usage observability, and runs a 90-minute handoff session. Your team leaves with a working RAG system, an eval harness, and a runbook.
- W1week 1
Content audit + ingestion prototype
Week 1 is the content audit — we inventory every source (Notion, Confluence, Google Drive, SharePoint, S3, custom), assess structure (chunkable? already split? markdown vs. PDF?), identify PII risk, and ship an ingestion prototype hitting one representative source end-to-end. You see real embeddings in a test vector DB by end of week 1.
- W2week 2
Retrieval tuning + reranker
Week 2 tunes retrieval — chunk size and overlap tuned against real queries, embedding model selected (OpenAI ada-3, Cohere, Voyage, or open-source), metadata filters added (source, recency, permission), and a reranker layer (Cohere rerank-3 or cross-encoder) bolts on to push precision above 0.85 on the top-5. You see the precision/recall numbers on real golden Q&A pairs by end of week 2.
- W3week 3
UI / API + evals + handoff
Week 3 ships the chat UI (React component with streaming, citations, and confidence) or the API (REST or SSE endpoint your team wraps themselves), hardens the eval suite (20–50 golden questions we run every time the ingestion changes), wires cost and usage observability, and runs a 90-minute handoff session. Your team leaves with a working RAG system, an eval harness, and a runbook.
A sample eval suite we ship.
Eval is the difference between a RAG demo and a RAG system. The golden-set format below runs on every ingestion change. For reference, see the Anthropic contextual retrieval playbook we cross-check against.
01# eval-suite/golden-set.yaml (trimmed)02# 20 golden Q&A pairs we rerun every time ingestion changes.03 04- id: q-00105 question: What's our SLA for customer support tickets on Enterprise plan?06 expected_answer_contains: ["4 business hours", "Enterprise"]07 forbidden: ["24 hours", "best effort"] # wrong answers that have appeared before08 must_cite_sources:09 - "enterprise-sla-2026.pdf"10 threshold_precision: 0.8011 12- id: q-00213 question: How do I rotate the Stripe webhook signing secret?14 expected_answer_contains: ["Stripe Dashboard", "webhooks", "rotate"]15 must_cite_sources:16 - "runbook-stripe-ops.md"17 threshold_precision: 0.7518 19# ... 18 more20 21# Run: pnpm eval:rag -- --set golden22# Output:23# Precision@5: 0.87 (target: 0.85 — PASS)24# Recall@5: 0.92 (target: 0.80 — PASS)25# Hallucination rate (answer contains no citation): 3.2% (target: <5% — PASS)26# Avg latency: 1.8s (p95: 3.1s)27# Per-query cost: $0.018 average (with Anthropic caching on the system prompt)What the build delivers.
Five deliverables. Production RAG. Yours to own and extend.
- 01Ingestion pipeline (PDFs, Notion, Confluence, Google Drive, SharePoint, or custom)
- 02Vector DB (Pinecone, Chroma, Weaviate, or pgvector) + embedding strategy
- 03Reranking layer (Cohere or cross-encoder) for precision
- 04Chat UI or API — your team picks
- 05Eval suite — we show you the precision/recall on your content
Fixed fee. Reranker included. Eval suite from day one.
One content domain per build. Adding a second domain later runs at $4,999 (patterns reused). HIPAA path adds $2,000 for BAA coordination and PHI-aware chunking.
- turnaround
- 3 weeks
- scope
- Ingestion · vector DB · reranker · chat UI or API · golden-set eval · handoff
- guarantee
- Precision > 0.85 on golden-set. Source citations required. Eval suite yours to run.
RAG Build vs fine-tuning vs vendor RAG vs DIY.
Four dimensions. The lime column is what you get when you build RAG right — with eval, with citations, with a reranker, at fixed price, yours to run.
| Dimension | Fine-tuning | Vendor RAG | DIY LangChain | Afterbuild Labs RAG |
|---|---|---|---|---|
| Approach | Fine-tuning | Vendor RAG (Glean, etc.) | DIY (LangChain RetrievalQA) | Afterbuild Labs RAG Build |
| Price | $50k–$500k + ongoing retraining | $100k+/yr subscription | Engineering time (often 3–6 months) | $6,999 fixed · 3 weeks · yours to run |
| Freshness | Frozen at training time | Connectors on vendor's cadence | Depends on what you wire | Re-ingest nightly · answers pulled from live docs |
| Eval | Perplexity + vibes | Vendor dashboard | Rare — most DIY RAG has none | Golden-set eval suite · precision/recall you can track |
Who should book the build (and who should skip it).
Book the build if…
- →You have 500+ documents that matter and a real 'chat with the docs' use case.
- →Your team or customers are asking questions your docs could answer if retrieval worked.
- →You want source citations in every answer — not vibes-based summaries.
- →You need an eval suite to track precision over time, not a one-shot demo.
- →You want to own the system, not rent from Glean / Guru / Moveworks / Notion AI.
Do not book the build if…
- →You have fewer than 100 documents — the LLM can just read them all in context, no RAG needed.
- →Your docs change by the minute (breaking news, live trading data) — that's a streaming ETL problem, not RAG.
- →You want an autonomous agent with 2–3 tools — book AI Agent MVP ($9,499) instead.
- →You want to add a chatbot to your product UI with no search backend — book the API Integration Sprint.
- →Your corpus is in an exotic format (audio archives, hand-written scans at scale) — confirm fit on scoping.
RAG Build — your questions, answered.
Build a RAG system that ships. Three weeks.
Three weeks. $6,999 fixed. Production RAG with ingestion, vector DB, reranker, chat UI or API, and a golden-set eval suite you can track over time.
Book free diagnostic →