§ PLATFORM/claude-api-developer

What breaks when you ship a Claude app

Claude API developers who ship with prompt caching, tool use, computer use, and Message Batches. We pick the right model (Opus 4.7 for hardest reasoning, Sonnet 4.6 default, Haiku 4.5 for bulk), turn on caching on the first day, and wire MCP integrations where they fit.

48%
AI code vulnerability rate (Veracode 2025): 7
Claude problem pages indexed: 48h
Rescue diagnostic SLA

Free rescue diagnostic →Send us the repo

Quick verdict

Claude API engagements cover the seven places Claude production builds typically stall without senior engineers: prompt caching never enabled (which alone is worth 90% cost reduction on cached segments), no cache_control blocks structured into the system prompt, tool-use calls timing out or looping, Opus used where Sonnet would handle the work, Claude Agent SDK underutilized, no observability on token spend, and confusion between the direct Anthropic API, Bedrock, and Vertex paths. We ship on Claude end-to-end: Opus 4.7 for the hardest reasoning, Sonnet 4.6 for default, Haiku 4.5 for classifiers and bulk. Every project ships with cache_control, Message Batches (50% off for non-urgent workloads), tool use, and MCP where the client surface benefits.

§ FAILURES/every way it ships broken

Every way Claude ships broken code

Claude is our default recommendation for quality-sensitive text — code, long reasoning, structured analysis, nuanced writing. The failure mode isn't that the Anthropic SDK is hard; it's that production features (prompt caching, Batch API, model routing, tool-use retry logic) sit a layer above the 'hello world' example. Teams ship without them and then spend a quarter digging out. This page is for hiring senior Claude engineers who turn those features on from day one.

E-01✕ FAIL

Prompt caching never enabled

Anthropic's cache_control blocks cut cost 90% on cached reads — the single highest-leverage lever on any Claude bill. Most builds we audit have never set a cache_control block. The fix is usually under an hour; the savings land immediately.

E-02✕ FAIL

cache_control blocks placed wrong

cache_control works only when the cached prefix is stable across requests. Templating user-specific data into the system prompt silently kills the cache. We restructure the prompt so the cached portion actually repeats request-over-request.

E-03✕ FAIL

Opus 4.7 running tasks Sonnet 4.6 would handle

Opus is for the hardest reasoning — multi-hop, novel analysis, code that requires planning. Most chat, drafting, extraction, and classification tasks are Sonnet territory. Model routing typically cuts cost 50–70% without a quality regression.

E-04✕ FAIL

Tool calls timing out or looping

Tool-use production patterns require: idempotent tool handlers, retry with repair on malformed inputs, max-step caps, and a kill-switch for runaway sessions. Ship without these and a single stuck tool call can cost $50+.

E-05✕ FAIL

Message Batches unused

Non-urgent workloads (eval suites, nightly summaries, bulk content generation) run 50% off on Message Batches. The API is a near drop-in; most teams haven't even heard of it.

E-06✕ FAIL

Claude Agent SDK underutilized

Skills, memory, subagents, context management — the Agent SDK does the heavy lifting that teams usually hand-roll. We migrate hand-rolled loops to the Agent SDK where it fits, and keep LangGraph where the graph is the right shape.

E-07✕ FAIL

Bedrock / Vertex AI confusion

Direct Anthropic API is fastest to new features. Bedrock is the AWS compliance path. Vertex is the GCP path. Each has different model availability, different pricing, different BAA coverage. We map your compliance and infra needs to the right path on the scoping call.

§ RESCUE/from your app to production

From your Claude app to production

The rescue path we run on every Claude engagement. Fixed price, fixed scope, no hourly surprises.

0148h
Free rescue diagnostic
Send the repo. We audit the Claude app — auth, DB, integrations, deploy — and return a written fix plan in 48 hours.
02Week 1
Triage & stop-the-bleed
Patch the highest-impact failure modes first — the RLS hole, the broken webhook, the OAuth loop. No feature work until production is safe.
03Week 2-3
Hardening & test coverage
Real migrations, signed webhooks, session management, error monitoring. Tests for every regression so Claude prompts can't re-break them.
04Week 4
Production handoff
Deploy to a portable stack (Vercel / Fly / Railway), hand back a repo your next engineer can read, and stay on-call for 2 weeks.

0148h
Free rescue diagnostic
Send the repo. We audit the Claude app — auth, DB, integrations, deploy — and return a written fix plan in 48 hours.
02Week 1
Triage & stop-the-bleed
Patch the highest-impact failure modes first — the RLS hole, the broken webhook, the OAuth loop. No feature work until production is safe.
03Week 2-3
Hardening & test coverage
Real migrations, signed webhooks, session management, error monitoring. Tests for every regression so Claude prompts can't re-break them.
04Week 4
Production handoff
Deploy to a portable stack (Vercel / Fly / Railway), hand back a repo your next engineer can read, and stay on-call for 2 weeks.

§ COMPARE/other ai builders

Claude compared to other AI builders

Evaluating Claude against another tool, or moving between them? Start here.

Related platforms

Head-to-head comparisons

Related Build + Integrate services

Most Claude engagements route through one of these fixed-fee services. Pick the one that matches your current shape.

Integrate Claude API (solution)

End-to-end Claude implementation: prompt caching, tool use, streaming, cost observability.

AI Cost Audit — $2,499 / 3 days

Cut Claude spend 40–80% with a caching + routing + Message Batches pass. Written patch plan in 72h.

AI Agent MVP — $9,499 / 4 weeks

Production agent on Claude Agent SDK with 2–3 tools, guardrails, observability, and HITL.

Build pillar overview

Every fixed-fee service we run for teams shipping on Anthropic, OpenAI, or Gemini.

§ PRICING/fixed price, fixed scope

Claude rescue pricing

Three entry points. Every engagement is fixed-fee with a written scope — no hourly surprises, no per-credit gambling.

price

Free

turnaround: 48 hours
scope: Written Claude audit + fix plan
guarantee: No obligation

Book diagnostic→

most common

price

$299

turnaround: 48 hours
scope: Emergency triage for a single critical failure
guarantee: Fix or refund

Triage now→

price

From $15k

turnaround: 2–6 weeks
scope: Full Claude rescue — auth, DB, integrations, deploy
guarantee: Fixed price

Start rescue→

When you need us

→You shipped on Claude and the bill is growing faster than expected — caching audit urgent
→Tool-use or Agent SDK features work in dev and drop in production
→You're choosing between direct Anthropic API, Bedrock, and Vertex and need a senior engineer to decide
→A new feature needs to ship on Claude in 1–4 weeks with production-grade caching and observability

Stack we support

Anthropic SDK (Python)Anthropic SDK (Node / TypeScript)Anthropic SDK (Ruby)Claude Agent SDKAWS Bedrock (Claude)Google Vertex AI (Claude)Message Batches APITool UseComputer UseMCPPrompt caching

§ FAQ/founders ask

Claude questions founders ask

FAQ

Which Claude model should I use — Opus 4.7, Sonnet 4.6, or Haiku 4.5?

Three tiers. Opus 4.7 for the hardest reasoning: complex code, long-chain analysis, research-grade writing, legal / medical / technical deep dives. Sonnet 4.6 is the default — roughly 80% of tasks (chat, drafting, tool use, most extraction, most summarization). Haiku 4.5 for bulk cheap work — classifiers, short extractors, routing decisions, eval graders. Most production builds we ship use a router: Haiku classifies the request, then routes to Sonnet or Opus based on complexity. Typical routed cost is 40–70% below Sonnet-on-everything.

How much does prompt caching actually save?

Cached tokens cost 10% of the input-token list price on read — a 90% reduction. Cache writes cost 125% of input-token rate on write, so caching pays back once the cached prefix is read at least twice within the cache TTL. For most production workloads the cached prefix repeats dozens-to-thousands of times per cache window, so the effective cost on cached content is ~10% of uncached. Single highest-leverage lever on any Claude bill we audit.

When should we use the Message Batches API?

When the work is non-urgent and can tolerate up to 24-hour turnaround — eval suites, nightly content generation, research-style retrospective analysis, bulk enrichment. 50% off list price. Not for user-facing synchronous calls. We typically move eval and nightly cron work to Batches on Day 2 of most engagements; the migration is usually a few dozen lines of code.

Direct Anthropic API vs. Bedrock vs. Vertex — when does each win?

Direct API: fastest to new models and features (prompt caching arrived weeks ahead of Bedrock), simplest billing, best pricing for most workloads. Bedrock: the AWS path — if you're committed to AWS, need HIPAA BAA through AWS, or want Bedrock's Guardrails layer. Vertex: the GCP path — same logic for GCP-first teams. Feature parity lags the direct API by weeks. We pick on the scoping call based on compliance, infra, and procurement constraints.

Does Claude work with MCP out of the box?

Yes. Claude Desktop is the most mature MCP client. Claude Agent SDK has first-class MCP support — you can register MCP servers as tools directly and Claude picks them up. The sweet spot is: your team builds MCP servers (see our MCP Server Build service), Claude consumes them via Agent SDK or Desktop, and the same servers work with Cursor, Windsurf, ChatGPT — one integration, every client.

What does a typical Claude engagement look like?

Most engagements route through one of four fixed-fee services: API Integration Sprint ($1,999 / 1 week) for a focused feature, AI Cost Audit ($2,499 / 3 days) for a caching-and-routing pass on an existing build, AI Agent MVP ($9,499 / 4 weeks) for a production agent, RAG Build ($6,999 / 3 weeks) for retrieval-augmented generation. Scoping call on Day 1 routes you to the right one.

About the author

Hyder Shah leads Afterbuild Labs, shipping production rescues for apps built in Lovable, Bolt.new, Cursor, v0, Replit Agent, Base44, Claude Code, and Windsurf — at fixed price.

Next step

Stuck on your Claude app?

Send the repo. We'll tell you what it takes to ship Claude to production — in 48 hours.

Book free diagnostic →

What breaks when you ship a Claude app

Every way Claude ships broken code

Prompt caching never enabled

cache_control blocks placed wrong

Opus 4.7 running tasks Sonnet 4.6 would handle

Tool calls timing out or looping

Message Batches unused

Claude Agent SDK underutilized

Bedrock / Vertex AI confusion

From your Claude app to production

Free rescue diagnostic

Triage & stop-the-bleed

Hardening & test coverage

Production handoff

Free rescue diagnostic

Triage & stop-the-bleed

Hardening & test coverage

Production handoff

Claude compared to other AI builders

Claude rescue pricing

Claude questions founders ask

Stuck on your Claude app?