Claude
Plain Explanation
Claude lets teams add high‑quality language and coding capabilities to apps via an HTTP API instead of hosting models themselves. You call the model with the Messages API for turn‑by‑turn control, use prompt caching to reuse large repeated context at lower cost/latency, and govern usage with token‑based pricing, token‑bucket rate limits, and monthly spend limits.
Examples & Analogies
- Regulatory report summarization: cache the shared instructions and structure, then run nightly batches across thousands of documents at lower cost.
- Long‑horizon refactors: use Managed Agents sessions to carry multi‑step changes with event history.
- Internal Q&A bot: persist large policy documents via cache writes and send only queries at request time.
At a Glance
| Claude API (first‑party) | Amazon Bedrock | Vertex AI | |
|---|---|---|---|
| Billing | Token‑based (input/output, cache pricing) | CCU‑based | Google Cloud billing |
| Endpoints | Global by default | Global/Regional | Global/Multi‑region/Regional |
| Model access | Per Claude docs | Per Bedrock catalog | Per Vertex catalog |
| Rate limits | Token bucket (RPM/ITPM/OTPM) | Similar semantics; platform quotas | Platform quotas/limits |
Where and Why It Matters
- Standardized model selection: choose among Opus/Sonnet/Haiku to balance capability, latency, and cost, and up/down‑shift as needed.
- Prompt caching benefit: cache reads are priced separately and, for most models, excluded from ITPM—boosting effective throughput.
- Spend/limit governance: workspace limits and tiers keep monthly exposure bounded.
- Batch vs interactive: use /v1/messages for interactive paths and /v1/messages/batches for large, non‑urgent workloads.
Common Misconceptions
- ❌ “Claude is a single model.” → ✅ It’s a family (e.g., Opus/Sonnet/Haiku) with distinct trade‑offs.
- ❌ “All input tokens count the same for limits.” → ✅ For most models, cache reads don’t count toward ITPM.
- ❌ “You can only use the first‑party API.” → ✅ Amazon Bedrock and Vertex AI also host Claude with different billing/endpoint policies.
How It Sounds in Conversation
- "Put the system prompt in cache and preflight with count_tokens before sending."
- "On 429s, honor retry‑after and tune concurrency by RPM/ITPM/OTPM."
- "Keep interactive on Messages and push backfills to Batches to smooth demand."
Related Reading
References
- API overview - Claude API Docs
Entry point for available APIs, endpoints, authentication, and availability.
- Choosing the right model - Claude API Docs
How to pick among Opus, Sonnet, and Haiku based on capability, speed, and cost.
- Release notes - Claude Platform
Timeline of model launches (e.g., Opus 4.7), features, and platform changes.
- Pricing - Claude API Docs
Per-model input/output prices, cache pricing, CCU details, and cloud endpoint notes.
- Rate limits - Claude API Docs
Token-bucket limits, spend tiers, and cache-aware ITPM behavior.