§ S-08/ai-agent-mvp

A production AI agent. Four weeks. Not a demo.

LangGraph or Claude Agent SDK. 2–3 tools wired to your stack. Guardrails, observability, human-in-the-loop, cost controls. $9,499 fixed. Handoff included.

$9,499fixed fee

4 weeksship date

prodguardrails · observability · HITL · kill-switch

book the build █not sure yet? free readiness audit →

Quick verdict

AI Agent MVP — a 4-week engagement that picks the framework (LangGraph when the graph is explicit and observable; Claude Agent SDK when you want Anthropic-native skills, memory, subagents), ships a production agent with 2–3 tools wired to your stack, and hardens for production: guardrails (max steps, max tokens, tool whitelist, kill-switch, per-session cost ceiling), observability (LangSmith or Helicone, every run traceable, cost telemetry, alerting), human-in-the-loop for writes that are hard to reverse, and an eval suite of 15–30 golden scenarios. Closes with a runbook, a handoff session, and a production deployment your team owns.

§ 01/agent-patterns

Five agent shapes we ship most.

Most agent MVPs fit one of these five shapes. The scoping call maps your specific need to the nearest shape and confirms framework, tools, and HITL boundaries.

AI Agent MVP · 5 agent shapes · LangGraph or Claude Agent SDK

Use case	What's missing today	What the agent ships
Support ops needs an agent to triage + escalate + draft replies	RAG alone can't decide when to escalate; static rules are too brittle	LangGraph agent with classifier → retrieval → draft → confidence gate → human escalation
Sales ops needs lead qualification + enrichment + CRM updates	n8n workflow hit a ceiling — logic is dynamic, needs multi-step reasoning	Agent with Apollo / Clearbit / LinkedIn enrichment + CRM write tools + disqualification path
Engineering ops wants an agent to triage GitHub issues + assign + draft fix plans	Classifier isn't enough — needs to read the repo, match similar issues, plan a fix	Claude Agent SDK build with repo read, issue search, assignment, and human sign-off tools
Internal research agent — synthesize reports from N sources on demand	One-shot LLM call misses; manual orchestration is fragile	Agent with search + fetch + synthesis tools, citations required, guardrails on sources
Customer-facing agent that takes actions on their behalf	Every wrong action is a refund or an incident; needs humans in the loop	Agent with kill-switch, human-in-the-loop for writes, full audit log, rollback tools

§ 02/4-week-schedule

The 4-week agent-MVP schedule.

Week 1: design + tool scoping. Week 2: core loop + first tool. Week 3: remaining tools + guardrails. Week 4: observability + evals + handoff.

W1week 1
Agent design + tool scoping
Week 1 is architecture — we pick the framework (LangGraph when the graph is explicit and observable; Claude Agent SDK when you want Anthropic-native features like skills, memory, subagents), scope the 2–3 tools, define success criteria, and sketch the eval set. By end of week 1 you sign off on a written architecture spec: states, transitions, tool surface, guardrails.
W2week 2
Core loop + first tool
Week 2 ships the core agent loop — plan → act → observe → decide — with the first tool wired. For LangGraph that's a working graph with interrupts. For Claude Agent SDK that's a working agent session with one skill and one tool. First end-to-end run against a staging input lands by end of week 2.
W3week 3
Remaining tools + guardrails
Week 3 adds the remaining tools and locks in guardrails: max steps (so infinite loops cost $0 instead of $500), max tokens per run, tool whitelist, cost ceiling per session, kill-switch endpoint, idempotent retries on tool calls, human-in-the-loop checkpoints for writes. Eval suite runs at end of week 3 — we tune until the agent passes.
W4week 4
Observability + evals + handoff
Week 4 wires observability (LangSmith traces for LangGraph; Helicone or custom OpenTelemetry for Agent SDK), cost and token telemetry, alerting on repeated failures, and the runbook. Handoff session covers how to add a tool, tune the prompt, interpret traces, and roll back. Your team leaves with a production agent, an eval suite, and a written spec.

W1week 1
Agent design + tool scoping
Week 1 is architecture — we pick the framework (LangGraph when the graph is explicit and observable; Claude Agent SDK when you want Anthropic-native features like skills, memory, subagents), scope the 2–3 tools, define success criteria, and sketch the eval set. By end of week 1 you sign off on a written architecture spec: states, transitions, tool surface, guardrails.
W2week 2
Core loop + first tool
Week 2 ships the core agent loop — plan → act → observe → decide — with the first tool wired. For LangGraph that's a working graph with interrupts. For Claude Agent SDK that's a working agent session with one skill and one tool. First end-to-end run against a staging input lands by end of week 2.
W3week 3
Remaining tools + guardrails
Week 3 adds the remaining tools and locks in guardrails: max steps (so infinite loops cost $0 instead of $500), max tokens per run, tool whitelist, cost ceiling per session, kill-switch endpoint, idempotent retries on tool calls, human-in-the-loop checkpoints for writes. Eval suite runs at end of week 3 — we tune until the agent passes.
W4week 4
Observability + evals + handoff
Week 4 wires observability (LangSmith traces for LangGraph; Helicone or custom OpenTelemetry for Agent SDK), cost and token telemetry, alerting on repeated failures, and the runbook. Handoff session covers how to add a tool, tune the prompt, interpret traces, and roll back. Your team leaves with a production agent, an eval suite, and a written spec.

§ 03/sample-graph

A sample LangGraph we shipped.

This is the shape of a support-triage agent — explicit graph, interrupts for low confidence, guardrails on recursion / tokens / cost, LangSmith traces on by default. For reference, see the LangGraph docs.

agent/graph.ts

typescript

01// agent/graph.ts — support-triage agent (trimmed)02import { StateGraph, START, END, interrupt } from "@langchain/langgraph";03import { classify, retrieveKB, draftReply, escalate } from "./tools";04 05type TicketState = {06  ticket: Ticket;07  classification?: "billing" | "technical" | "other";08  retrieved?: KBArticle[];09  draft?: string;10  confidence?: number;11  needsHuman?: boolean;12};13 14const graph = new StateGraph<TicketState>({ channels: { /* ... */ } });15 16graph17  .addNode("classify", classify)              // LLM call, cache_control on system prompt18  .addNode("retrieve", retrieveKB)            // Pinecone search, top-5 with rerank19  .addNode("draft", draftReply)               // LLM call, Sonnet 4.6, max_tokens: 60020  .addNode("human_review", () => interrupt({ reason: "low_confidence" }))21  .addNode("escalate", escalate)              // Zendesk update + Slack notify22 23  .addEdge(START, "classify")24  .addEdge("classify", "retrieve")25  .addEdge("retrieve", "draft")26  .addConditionalEdges("draft", (s) =>27    s.confidence! < 0.7 ? "human_review" : "escalate"28  )29  .addEdge("human_review", "escalate")30  .addEdge("escalate", END);31 32export const app = graph.compile({33  checkpointer: new PostgresSaver({ /* persist state between interrupts */ }),34  // Guardrails — every run respects these:35  recursionLimit: 12,36  maxTokensPerRun: 8000,37  costCeilingUSD: 0.50,38});39 40// Kill switch — set KILL_AGENT_RUNS=true to drop all new sessions41// Traces ship to LangSmith automatically when LANGCHAIN_TRACING_V2=true

Every edge, every interrupt, every guardrail — visible in one file. That's production.

§ 04/ledger

What the MVP delivers.

Five deliverables. Production agent. Yours to run.

01Agent architecture (LangGraph or Claude Agent SDK, your choice)
022–3 tools wired to your stack (MCP, native Python/TS, or API)
03Guardrails: max steps, max tokens, tool whitelist, kill switch
04Observability: LangSmith, Helicone, or custom — every run traceable
05Human-in-the-loop escalation path for risky actions

§ 05/engagement-price

Fixed fee. 2–3 tools. HITL built in.

One agent per MVP. Additional tools past 3 add $1,500 each. Multi-agent (3+ personas) is a separate engagement that usually starts after this one is in production 3+ months.

agent

price

$9,499

turnaround: 4 weeks
scope: LangGraph or Claude Agent SDK · 2–3 tools · guardrails · observability · HITL · eval suite
guarantee: Eval suite passes before handoff. Runbook covers add-a-tool, tune-the-prompt, roll-back.

book the build→

§ 06/vs-alternatives

AI Agent MVP vs chat wrapper vs multi-agent vs in-house.

Four dimensions. The lime column is what you get with a bounded, observable, HITL agent built in four weeks.

AI Agent MVP · guardrails · observability · HITL · kill-switch

Dimension	Chat wrapper	Multi-agent from day one	In-house (3 months)	Afterbuild Labs MVP
Approach	Chat wrapper ('ChatGPT for X')	Multi-agent from day one	Hiring in-house (3 months)	Afterbuild Labs AI Agent MVP
Autonomy	None — user drives every step	Too much — debugging nightmare	Depends on the hire	Bounded · interrupts · kill-switch
Observability	Logs at best	Tangle of agent-to-agent traces	Whatever gets wired	LangSmith or Helicone · every run traceable
Cost ceiling	None — each user session unbounded	None — agents spawn agents	Varies	Per-session cost cap · kill-switch · token budget

§ 07/fit-check

Who should book the MVP (and who should skip it).

Book the MVP if…

→You have a workflow that needs multi-step reasoning, not just a single LLM call.
→You've outgrown n8n or a simple workflow — dynamic decisions per step are now in scope.
→You need tool use (read CRM, write ticket, draft reply, enrich lead) with clear HITL boundaries.
→You want observability from day one — LangSmith or Helicone traces, not print statements.
→You want to ship in four weeks, not three months — with guardrails that keep cost bounded.

Do not book the MVP if…

→You just need a chat feature — book API Integration Sprint ($1,999) instead.
→You need retrieval-augmented Q&A, not multi-step action — book RAG Build ($6,999) instead.
→You need simple workflow automation — book AI Automation Sprint ($3,999) instead.
→Your use case needs 5+ autonomous personas collaborating — that's multi-agent, a separate engagement.
→You need voice-first intake — book Voice Agent Launch ($4,999) instead.

§ 08/mvp-faq

AI Agent MVP — your questions, answered.

FAQ

LangGraph vs. Claude Agent SDK vs. CrewAI — which do you pick?

Three-way call. LangGraph when the graph is explicit and observable (retrieval → decide → act → human-review), when you need interrupts for human-in-the-loop, and when you want provider-agnostic (OpenAI or Anthropic swap). Claude Agent SDK when you want Anthropic-native features (skills, memory, subagents, tight MCP integration) and your primary client is Claude. CrewAI when you want role-play-style multi-agent and the team is comfortable with its opinions. We route on the Day-1 scoping call.

What does 'production' mean here vs. a prototype?

Production has five properties a prototype doesn't: (1) guardrails — max steps, max tokens, tool whitelist, kill-switch; (2) observability — every run traceable, cost telemetry, alerting on repeated failures; (3) human-in-the-loop — for any write that's hard to reverse; (4) evals — golden set that runs on every change, so regressions are caught; (5) cost ceilings — per session, per tool, per day. Without those, it's a demo.

Can we use this for customer-facing flows?

Yes, with the human-in-the-loop path on every write that's hard to reverse (refunds, account changes, policy decisions). Read-only customer-facing agents (answer questions, look up orders) can run autonomously with confidence thresholds. Writes should pause for a human unless the action is reversible and low-risk. We design the HITL graph explicitly in week 1.

How is agent cost controlled?

Four layers. (1) Per-session cost ceiling — agent halts if it exceeds the budget. (2) Max steps per session — infinite loops cost $0 not $500. (3) Max tokens per run — caps the damage of a verbose failure mode. (4) Prompt caching on the system prompt — 90% cost reduction on the repeated scaffolding. A typical production agent session costs $0.02–$0.30 with these controls; without them, a bad run can cost $20+.

What if the agent breaks in production?

Kill-switch flag flips new sessions off immediately (existing sessions complete under their caps). LangSmith or Helicone traces show you the state at every step. Common failure modes we've debugged: tool call looping, prompt caching stale state, one tool raising unhandled 500 errors. The runbook covers how to diagnose each. If the build is within 30 days of handoff, we cover the fix at no charge.

When should we upgrade to multi-agent?

When the single-agent graph has more than ~10 nodes and the team is losing track of state transitions. Multi-agent is the right shape when you have 3+ distinct personas (researcher, writer, reviewer) with separate tool surfaces and independent evals. The AI Agent MVP is single-agent (LangGraph supports subgraphs, but we stay within one coherent surface). Multi-agent builds are a custom engagement ($15K–$40K) and usually start after a single-agent MVP has been in production for 3+ months.

§ 09/adjacent-services

Related build engagements

Related audits

Next step

Ship a production agent in four weeks.

Four weeks. $9,499 fixed. LangGraph or Claude Agent SDK. Guardrails, observability, human-in-the-loop, eval suite. Not a demo — a production agent your team owns.

Book free diagnostic →