A production AI agent. Four weeks. Not a demo.
LangGraph or Claude Agent SDK. 2–3 tools wired to your stack. Guardrails, observability, human-in-the-loop, cost controls. $9,499 fixed. Handoff included.
AI Agent MVP — a 4-week engagement that picks the framework (LangGraph when the graph is explicit and observable; Claude Agent SDK when you want Anthropic-native skills, memory, subagents), ships a production agent with 2–3 tools wired to your stack, and hardens for production: guardrails (max steps, max tokens, tool whitelist, kill-switch, per-session cost ceiling), observability (LangSmith or Helicone, every run traceable, cost telemetry, alerting), human-in-the-loop for writes that are hard to reverse, and an eval suite of 15–30 golden scenarios. Closes with a runbook, a handoff session, and a production deployment your team owns.
Five agent shapes we ship most.
Most agent MVPs fit one of these five shapes. The scoping call maps your specific need to the nearest shape and confirms framework, tools, and HITL boundaries.
| Use case | What's missing today | What the agent ships |
|---|---|---|
| Support ops needs an agent to triage + escalate + draft replies | RAG alone can't decide when to escalate; static rules are too brittle | LangGraph agent with classifier → retrieval → draft → confidence gate → human escalation |
| Sales ops needs lead qualification + enrichment + CRM updates | n8n workflow hit a ceiling — logic is dynamic, needs multi-step reasoning | Agent with Apollo / Clearbit / LinkedIn enrichment + CRM write tools + disqualification path |
| Engineering ops wants an agent to triage GitHub issues + assign + draft fix plans | Classifier isn't enough — needs to read the repo, match similar issues, plan a fix | Claude Agent SDK build with repo read, issue search, assignment, and human sign-off tools |
| Internal research agent — synthesize reports from N sources on demand | One-shot LLM call misses; manual orchestration is fragile | Agent with search + fetch + synthesis tools, citations required, guardrails on sources |
| Customer-facing agent that takes actions on their behalf | Every wrong action is a refund or an incident; needs humans in the loop | Agent with kill-switch, human-in-the-loop for writes, full audit log, rollback tools |
The 4-week agent-MVP schedule.
Week 1: design + tool scoping. Week 2: core loop + first tool. Week 3: remaining tools + guardrails. Week 4: observability + evals + handoff.
- W1week 1
Agent design + tool scoping
Week 1 is architecture — we pick the framework (LangGraph when the graph is explicit and observable; Claude Agent SDK when you want Anthropic-native features like skills, memory, subagents), scope the 2–3 tools, define success criteria, and sketch the eval set. By end of week 1 you sign off on a written architecture spec: states, transitions, tool surface, guardrails.
- W2week 2
Core loop + first tool
Week 2 ships the core agent loop — plan → act → observe → decide — with the first tool wired. For LangGraph that's a working graph with interrupts. For Claude Agent SDK that's a working agent session with one skill and one tool. First end-to-end run against a staging input lands by end of week 2.
- W3week 3
Remaining tools + guardrails
Week 3 adds the remaining tools and locks in guardrails: max steps (so infinite loops cost $0 instead of $500), max tokens per run, tool whitelist, cost ceiling per session, kill-switch endpoint, idempotent retries on tool calls, human-in-the-loop checkpoints for writes. Eval suite runs at end of week 3 — we tune until the agent passes.
- W4week 4
Observability + evals + handoff
Week 4 wires observability (LangSmith traces for LangGraph; Helicone or custom OpenTelemetry for Agent SDK), cost and token telemetry, alerting on repeated failures, and the runbook. Handoff session covers how to add a tool, tune the prompt, interpret traces, and roll back. Your team leaves with a production agent, an eval suite, and a written spec.
- W1week 1
Agent design + tool scoping
Week 1 is architecture — we pick the framework (LangGraph when the graph is explicit and observable; Claude Agent SDK when you want Anthropic-native features like skills, memory, subagents), scope the 2–3 tools, define success criteria, and sketch the eval set. By end of week 1 you sign off on a written architecture spec: states, transitions, tool surface, guardrails.
- W2week 2
Core loop + first tool
Week 2 ships the core agent loop — plan → act → observe → decide — with the first tool wired. For LangGraph that's a working graph with interrupts. For Claude Agent SDK that's a working agent session with one skill and one tool. First end-to-end run against a staging input lands by end of week 2.
- W3week 3
Remaining tools + guardrails
Week 3 adds the remaining tools and locks in guardrails: max steps (so infinite loops cost $0 instead of $500), max tokens per run, tool whitelist, cost ceiling per session, kill-switch endpoint, idempotent retries on tool calls, human-in-the-loop checkpoints for writes. Eval suite runs at end of week 3 — we tune until the agent passes.
- W4week 4
Observability + evals + handoff
Week 4 wires observability (LangSmith traces for LangGraph; Helicone or custom OpenTelemetry for Agent SDK), cost and token telemetry, alerting on repeated failures, and the runbook. Handoff session covers how to add a tool, tune the prompt, interpret traces, and roll back. Your team leaves with a production agent, an eval suite, and a written spec.
A sample LangGraph we shipped.
This is the shape of a support-triage agent — explicit graph, interrupts for low confidence, guardrails on recursion / tokens / cost, LangSmith traces on by default. For reference, see the LangGraph docs.
01// agent/graph.ts — support-triage agent (trimmed)02import { StateGraph, START, END, interrupt } from "@langchain/langgraph";03import { classify, retrieveKB, draftReply, escalate } from "./tools";04 05type TicketState = {06 ticket: Ticket;07 classification?: "billing" | "technical" | "other";08 retrieved?: KBArticle[];09 draft?: string;10 confidence?: number;11 needsHuman?: boolean;12};13 14const graph = new StateGraph<TicketState>({ channels: { /* ... */ } });15 16graph17 .addNode("classify", classify) // LLM call, cache_control on system prompt18 .addNode("retrieve", retrieveKB) // Pinecone search, top-5 with rerank19 .addNode("draft", draftReply) // LLM call, Sonnet 4.6, max_tokens: 60020 .addNode("human_review", () => interrupt({ reason: "low_confidence" }))21 .addNode("escalate", escalate) // Zendesk update + Slack notify22 23 .addEdge(START, "classify")24 .addEdge("classify", "retrieve")25 .addEdge("retrieve", "draft")26 .addConditionalEdges("draft", (s) =>27 s.confidence! < 0.7 ? "human_review" : "escalate"28 )29 .addEdge("human_review", "escalate")30 .addEdge("escalate", END);31 32export const app = graph.compile({33 checkpointer: new PostgresSaver({ /* persist state between interrupts */ }),34 // Guardrails — every run respects these:35 recursionLimit: 12,36 maxTokensPerRun: 8000,37 costCeilingUSD: 0.50,38});39 40// Kill switch — set KILL_AGENT_RUNS=true to drop all new sessions41// Traces ship to LangSmith automatically when LANGCHAIN_TRACING_V2=trueWhat the MVP delivers.
Five deliverables. Production agent. Yours to run.
- 01Agent architecture (LangGraph or Claude Agent SDK, your choice)
- 022–3 tools wired to your stack (MCP, native Python/TS, or API)
- 03Guardrails: max steps, max tokens, tool whitelist, kill switch
- 04Observability: LangSmith, Helicone, or custom — every run traceable
- 05Human-in-the-loop escalation path for risky actions
Fixed fee. 2–3 tools. HITL built in.
One agent per MVP. Additional tools past 3 add $1,500 each. Multi-agent (3+ personas) is a separate engagement that usually starts after this one is in production 3+ months.
- turnaround
- 4 weeks
- scope
- LangGraph or Claude Agent SDK · 2–3 tools · guardrails · observability · HITL · eval suite
- guarantee
- Eval suite passes before handoff. Runbook covers add-a-tool, tune-the-prompt, roll-back.
AI Agent MVP vs chat wrapper vs multi-agent vs in-house.
Four dimensions. The lime column is what you get with a bounded, observable, HITL agent built in four weeks.
| Dimension | Chat wrapper | Multi-agent from day one | In-house (3 months) | Afterbuild Labs MVP |
|---|---|---|---|---|
| Approach | Chat wrapper ('ChatGPT for X') | Multi-agent from day one | Hiring in-house (3 months) | Afterbuild Labs AI Agent MVP |
| Autonomy | None — user drives every step | Too much — debugging nightmare | Depends on the hire | Bounded · interrupts · kill-switch |
| Observability | Logs at best | Tangle of agent-to-agent traces | Whatever gets wired | LangSmith or Helicone · every run traceable |
| Cost ceiling | None — each user session unbounded | None — agents spawn agents | Varies | Per-session cost cap · kill-switch · token budget |
Who should book the MVP (and who should skip it).
Book the MVP if…
- →You have a workflow that needs multi-step reasoning, not just a single LLM call.
- →You've outgrown n8n or a simple workflow — dynamic decisions per step are now in scope.
- →You need tool use (read CRM, write ticket, draft reply, enrich lead) with clear HITL boundaries.
- →You want observability from day one — LangSmith or Helicone traces, not print statements.
- →You want to ship in four weeks, not three months — with guardrails that keep cost bounded.
Do not book the MVP if…
- →You just need a chat feature — book API Integration Sprint ($1,999) instead.
- →You need retrieval-augmented Q&A, not multi-step action — book RAG Build ($6,999) instead.
- →You need simple workflow automation — book AI Automation Sprint ($3,999) instead.
- →Your use case needs 5+ autonomous personas collaborating — that's multi-agent, a separate engagement.
- →You need voice-first intake — book Voice Agent Launch ($4,999) instead.
AI Agent MVP — your questions, answered.
Ship a production agent in four weeks.
Four weeks. $9,499 fixed. LangGraph or Claude Agent SDK. Guardrails, observability, human-in-the-loop, eval suite. Not a demo — a production agent your team owns.
Book free diagnostic →