Which Claude model should I use — Opus 4.7, Sonnet 4.6, or Haiku 4.5?
Three tiers. Opus 4.7 for the hardest reasoning: complex code, long-chain analysis, research-grade writing, legal / medical / technical deep dives. Sonnet 4.6 is the default — roughly 80% of tasks (chat, drafting, tool use, most extraction, most summarization). Haiku 4.5 for bulk cheap work — classifiers, short extractors, routing decisions, eval graders. Most production builds we ship use a router: Haiku classifies the request, then routes to Sonnet or Opus based on complexity. Typical routed cost is 40–70% below Sonnet-on-everything.
How much does prompt caching actually save?
Cached tokens cost 10% of the input-token list price on read — a 90% reduction. Cache writes cost 125% of input-token rate on write, so caching pays back once the cached prefix is read at least twice within the cache TTL. For most production workloads the cached prefix repeats dozens-to-thousands of times per cache window, so the effective cost on cached content is ~10% of uncached. Single highest-leverage lever on any Claude bill we audit.
When should we use the Message Batches API?
When the work is non-urgent and can tolerate up to 24-hour turnaround — eval suites, nightly content generation, research-style retrospective analysis, bulk enrichment. 50% off list price. Not for user-facing synchronous calls. We typically move eval and nightly cron work to Batches on Day 2 of most engagements; the migration is usually a few dozen lines of code.
Direct Anthropic API vs. Bedrock vs. Vertex — when does each win?
Direct API: fastest to new models and features (prompt caching arrived weeks ahead of Bedrock), simplest billing, best pricing for most workloads. Bedrock: the AWS path — if you're committed to AWS, need HIPAA BAA through AWS, or want Bedrock's Guardrails layer. Vertex: the GCP path — same logic for GCP-first teams. Feature parity lags the direct API by weeks. We pick on the scoping call based on compliance, infra, and procurement constraints.
Does Claude work with MCP out of the box?
Yes. Claude Desktop is the most mature MCP client. Claude Agent SDK has first-class MCP support — you can register MCP servers as tools directly and Claude picks them up. The sweet spot is: your team builds MCP servers (see our MCP Server Build service), Claude consumes them via Agent SDK or Desktop, and the same servers work with Cursor, Windsurf, ChatGPT — one integration, every client.
What does a typical Claude engagement look like?
Most engagements route through one of four fixed-fee services: API Integration Sprint ($1,999 / 1 week) for a focused feature, AI Cost Audit ($2,499 / 3 days) for a caching-and-routing pass on an existing build, AI Agent MVP ($9,499 / 4 weeks) for a production agent, RAG Build ($6,999 / 3 weeks) for retrieval-augmented generation. Scoping call on Day 1 routes you to the right one.