§ S-07/rag-build

Chat with your docs. For real.

Ingestion pipeline that keeps up with your content. Vector DB tuned for retrieval precision. Chat UI wired to your auth, your data, your model. Three weeks. $6,999.

$6,999fixed fee

3 weeksship date

evalgolden-set precision/recall you can track

book the build █not sure yet? free readiness audit →

Quick verdict

RAG Build — a 3-week engagement that audits your content sources (Notion, Confluence, Google Drive, SharePoint, Zendesk, PDFs, custom), ships an ingestion pipeline that keeps up with your changes (nightly batch by default; realtime on specific sources when justified), tunes retrieval with the right vector DB (Pinecone, Chroma, Weaviate, or pgvector), adds a reranker layer (Cohere or cross-encoder) to push precision above 0.85, and ships either a React chat UI with citations or a REST / SSE API your team wraps. Closes with an eval suite of 20–50 golden questions, observability wired, and a runbook for adding sources. $6,999 fixed, three weeks, yours to run.

§ 01/rag-patterns

Five RAG patterns we ship most.

Most RAG builds fit one of these five shapes. The scoping call maps your specific content to the nearest shape and confirms stack choices.

RAG Build · 5 shipping patterns · reranker · citations · eval suite

Situation	Today	What the build ships
Internal docs sprawl — 4k Notion pages + 2k PDFs no one reads	Full-text search is bad; employees ask in Slack for facts that are documented	RAG with source citations; chat UI wired to your SSO so access is audited
Support KB used only by agents, not customers	Customers can't find answers; deflection rate stalls at 20%	Customer-facing RAG chatbot with guardrails, citations, and human-escalation path
Compliance / policy / legal archive no one can search meaningfully	Questions answered by 'ask legal' — bottlenecks, slow decisions	RAG with strict source citations + confidence threshold; legal approves Q&A patterns
Product documentation fragmented across 5 systems	Engineers ask the same 20 questions in Slack daily	RAG indexing every source nightly; answers pulled live from current docs, not stale copy
Prototype built on LangChain `RetrievalQA` that 'sort of works'	No reranker, no eval, hallucinations hidden by confident tone	Reranker added, eval suite shipped with precision/recall numbers, hallucinations quantified and reduced

§ 02/3-week-schedule

The 3-week RAG-build schedule.

Week 1 ends with a working ingestion hitting one source end-to-end. Week 2 ends with retrieval + reranker tuned against golden questions. Week 3 ends with your chat UI or API live, eval suite running, handoff complete.

W1week 1
Content audit + ingestion prototype
Week 1 is the content audit — we inventory every source (Notion, Confluence, Google Drive, SharePoint, S3, custom), assess structure (chunkable? already split? markdown vs. PDF?), identify PII risk, and ship an ingestion prototype hitting one representative source end-to-end. You see real embeddings in a test vector DB by end of week 1.
W2week 2
Retrieval tuning + reranker
Week 2 tunes retrieval — chunk size and overlap tuned against real queries, embedding model selected (OpenAI ada-3, Cohere, Voyage, or open-source), metadata filters added (source, recency, permission), and a reranker layer (Cohere rerank-3 or cross-encoder) bolts on to push precision above 0.85 on the top-5. You see the precision/recall numbers on real golden Q&A pairs by end of week 2.
W3week 3
UI / API + evals + handoff
Week 3 ships the chat UI (React component with streaming, citations, and confidence) or the API (REST or SSE endpoint your team wraps themselves), hardens the eval suite (20–50 golden questions we run every time the ingestion changes), wires cost and usage observability, and runs a 90-minute handoff session. Your team leaves with a working RAG system, an eval harness, and a runbook.

W1week 1
Content audit + ingestion prototype
Week 1 is the content audit — we inventory every source (Notion, Confluence, Google Drive, SharePoint, S3, custom), assess structure (chunkable? already split? markdown vs. PDF?), identify PII risk, and ship an ingestion prototype hitting one representative source end-to-end. You see real embeddings in a test vector DB by end of week 1.
W2week 2
Retrieval tuning + reranker
Week 2 tunes retrieval — chunk size and overlap tuned against real queries, embedding model selected (OpenAI ada-3, Cohere, Voyage, or open-source), metadata filters added (source, recency, permission), and a reranker layer (Cohere rerank-3 or cross-encoder) bolts on to push precision above 0.85 on the top-5. You see the precision/recall numbers on real golden Q&A pairs by end of week 2.
W3week 3
UI / API + evals + handoff
Week 3 ships the chat UI (React component with streaming, citations, and confidence) or the API (REST or SSE endpoint your team wraps themselves), hardens the eval suite (20–50 golden questions we run every time the ingestion changes), wires cost and usage observability, and runs a 90-minute handoff session. Your team leaves with a working RAG system, an eval harness, and a runbook.

§ 03/sample-eval

A sample eval suite we ship.

Eval is the difference between a RAG demo and a RAG system. The golden-set format below runs on every ingestion change. For reference, see the Anthropic contextual retrieval playbook we cross-check against.

eval-suite/golden-set.yaml

yaml

01# eval-suite/golden-set.yaml (trimmed)02# 20 golden Q&A pairs we rerun every time ingestion changes.03 04- id: q-00105  question: What's our SLA for customer support tickets on Enterprise plan?06  expected_answer_contains: ["4 business hours", "Enterprise"]07  forbidden: ["24 hours", "best effort"] # wrong answers that have appeared before08  must_cite_sources:09    - "enterprise-sla-2026.pdf"10  threshold_precision: 0.8011 12- id: q-00213  question: How do I rotate the Stripe webhook signing secret?14  expected_answer_contains: ["Stripe Dashboard", "webhooks", "rotate"]15  must_cite_sources:16    - "runbook-stripe-ops.md"17  threshold_precision: 0.7518 19# ... 18 more20 21# Run:  pnpm eval:rag -- --set golden22# Output:23#   Precision@5: 0.87 (target: 0.85 — PASS)24#   Recall@5:    0.92 (target: 0.80 — PASS)25#   Hallucination rate (answer contains no citation): 3.2% (target: <5% — PASS)26#   Avg latency:  1.8s (p95: 3.1s)27#   Per-query cost: $0.018 average (with Anthropic caching on the system prompt)

20–50 questions. Precision, recall, hallucination rate, latency, per-query cost — tracked over time.

§ 04/ledger

What the build delivers.

Five deliverables. Production RAG. Yours to own and extend.

01Ingestion pipeline (PDFs, Notion, Confluence, Google Drive, SharePoint, or custom)
02Vector DB (Pinecone, Chroma, Weaviate, or pgvector) + embedding strategy
03Reranking layer (Cohere or cross-encoder) for precision
04Chat UI or API — your team picks
05Eval suite — we show you the precision/recall on your content

§ 05/engagement-price

Fixed fee. Reranker included. Eval suite from day one.

One content domain per build. Adding a second domain later runs at $4,999 (patterns reused). HIPAA path adds $2,000 for BAA coordination and PHI-aware chunking.

rag

price

$6,999

turnaround: 3 weeks
scope: Ingestion · vector DB · reranker · chat UI or API · golden-set eval · handoff
guarantee: Precision > 0.85 on golden-set. Source citations required. Eval suite yours to run.

book the build→

§ 06/vs-alternatives

RAG Build vs fine-tuning vs vendor RAG vs DIY.

Four dimensions. The lime column is what you get when you build RAG right — with eval, with citations, with a reranker, at fixed price, yours to run.

RAG Build · reranker · eval suite · citations · yours to extend

Dimension	Fine-tuning	Vendor RAG	DIY LangChain	Afterbuild Labs RAG
Approach	Fine-tuning	Vendor RAG (Glean, etc.)	DIY (LangChain RetrievalQA)	Afterbuild Labs RAG Build
Price	$50k–$500k + ongoing retraining	$100k+/yr subscription	Engineering time (often 3–6 months)	$6,999 fixed · 3 weeks · yours to run
Freshness	Frozen at training time	Connectors on vendor's cadence	Depends on what you wire	Re-ingest nightly · answers pulled from live docs
Eval	Perplexity + vibes	Vendor dashboard	Rare — most DIY RAG has none	Golden-set eval suite · precision/recall you can track

§ 07/fit-check

Who should book the build (and who should skip it).

Book the build if…

→You have 500+ documents that matter and a real 'chat with the docs' use case.
→Your team or customers are asking questions your docs could answer if retrieval worked.
→You want source citations in every answer — not vibes-based summaries.
→You need an eval suite to track precision over time, not a one-shot demo.
→You want to own the system, not rent from Glean / Guru / Moveworks / Notion AI.

Do not book the build if…

→You have fewer than 100 documents — the LLM can just read them all in context, no RAG needed.
→Your docs change by the minute (breaking news, live trading data) — that's a streaming ETL problem, not RAG.
→You want an autonomous agent with 2–3 tools — book AI Agent MVP ($9,499) instead.
→You want to add a chatbot to your product UI with no search backend — book the API Integration Sprint.
→Your corpus is in an exotic format (audio archives, hand-written scans at scale) — confirm fit on scoping.

§ 08/build-faq

RAG Build — your questions, answered.

FAQ

Which vector DB should we pick?

Depends on scale and stack. Pinecone is the default for SaaS teams (fast, managed, strong filters). Chroma for small-to-medium self-hosted projects. Weaviate when you need hybrid search and strong metadata filtering. pgvector for teams that already run Postgres and want to avoid a new infra dependency — it works well up to ~5M vectors. We make the call on the Day-1 scoping based on your document count, query volume, filter needs, and whether you require self-hosting.

Realtime ingestion or batch?

Batch (nightly) is the default because it's cheaper and simpler. Realtime (within seconds of source update) is worth it when: (1) your content changes throughout the day and answers need to reflect that, (2) you have webhook-friendly sources (Notion, Confluence, Zendesk), (3) the volume justifies streaming infra. For most teams, nightly batch is fine and we add realtime for specific sources only where needed.

How do you prevent hallucinations?

Three layers. (1) Reranker after retrieval pushes precision above 0.85, so the LLM is fed relevant context, not noise. (2) Source citations are required — every claim in the answer links back to the source chunks, and the UI shows them. (3) Confidence threshold — below a tuned threshold, the model returns 'I don't know, here's what I found' instead of fabricating. Eval suite quantifies the hallucination rate (answers with no supporting citation); we target under 5%.

Which content sources do you support?

First-class: Notion, Confluence, Google Drive (Docs + PDFs), SharePoint, Dropbox, Zendesk KB, HelpScout, Intercom, raw PDFs, raw markdown, websites (Firecrawl crawl). Custom sources (your internal DB, a proprietary CMS, a Git repo) are straightforward to add — usually a day of work inside the 3-week scope. We confirm source compatibility on the scoping call.

Can this scale to 1M+ docs?

Yes. Above ~1M documents we shift defaults: Pinecone or Weaviate over Chroma/pgvector, hierarchical chunking (parent-child), metadata sharding, and a two-stage retrieval (broad filter → dense retrieval → rerank). Above 10M docs we usually recommend a custom build rather than this fixed-scope sprint — the eval and latency work at that scale is its own engagement. Confirm on scoping.

HIPAA-compliant RAG path?

Yes, with caveats. BAA coverage: Anthropic (via AWS Bedrock), OpenAI (via Azure), Cohere (for reranking, BAA available), Pinecone (BAA on Enterprise), Postgres with pgvector (self-hosted, yours). No public embedding API calls on PHI — everything stays in your BAA-covered stack. HIPAA RAG adds ~$2,000 to the engagement to cover BAA coordination, PHI-aware chunking (to avoid accidentally embedding PII across doc boundaries), and audit-log requirements.

What about graph RAG?

Graph RAG (GraphRAG, LightRAG, Neo4j + embeddings) is worth the complexity when your corpus has explicit entity relationships that matter for answering — org charts, legal citations, product SKU hierarchies, scientific literature with cross-references. For most business content (Notion, Confluence, support KB) it's over-engineered. We evaluate fit on the scoping call and quote it as an add-on if it's the right shape.

§ 09/adjacent-services

Related build engagements

Related integrate paths

Next step

Build a RAG system that ships. Three weeks.

Three weeks. $6,999 fixed. Production RAG with ingestion, vector DB, reranker, chat UI or API, and a golden-set eval suite you can track over time.

Book free diagnostic →