Products & Platforms LLM & Generative AI Deep Learning

Claude

Difficulty

Plain Explanation

Claude lets teams add high‑quality language and coding capabilities to apps via an HTTP API instead of hosting models themselves. You call the model with the Messages API for turn‑by‑turn control, use prompt caching to reuse large repeated context at lower cost/latency, and govern usage with token‑based pricing, token‑bucket rate limits, and monthly spend limits.

Examples & Analogies

Regulatory report summarization: cache the shared instructions and structure, then run nightly batches across thousands of documents at lower cost.
Long‑horizon refactors: use Managed Agents sessions to carry multi‑step changes with event history.
Internal Q&A bot: persist large policy documents via cache writes and send only queries at request time.

At a Glance

	Claude API (first‑party)	Amazon Bedrock	Vertex AI
Billing	Token‑based (input/output, cache pricing)	CCU‑based	Google Cloud billing
Endpoints	Global by default	Global/Regional	Global/Multi‑region/Regional
Model access	Per Claude docs	Per Bedrock catalog	Per Vertex catalog
Rate limits	Token bucket (RPM/ITPM/OTPM)	Similar semantics; platform quotas	Platform quotas/limits

Where and Why It Matters

Standardized model selection: choose among Opus/Sonnet/Haiku to balance capability, latency, and cost, and up/down‑shift as needed.
Prompt caching benefit: cache reads are priced separately and, for most models, excluded from ITPM—boosting effective throughput.
Spend/limit governance: workspace limits and tiers keep monthly exposure bounded.
Batch vs interactive: use /v1/messages for interactive paths and /v1/messages/batches for large, non‑urgent workloads.

Common Misconceptions

❌ “Claude is a single model.” → ✅ It’s a family (e.g., Opus/Sonnet/Haiku) with distinct trade‑offs.
❌ “All input tokens count the same for limits.” → ✅ For most models, cache reads don’t count toward ITPM.
❌ “You can only use the first‑party API.” → ✅ Amazon Bedrock and Vertex AI also host Claude with different billing/endpoint policies.

How It Sounds in Conversation

"Put the system prompt in cache and preflight with count_tokens before sending."
"On 429s, honor retry‑after and tune concurrency by RPM/ITPM/OTPM."
"Keep interactive on Messages and push backfills to Batches to smooth demand."

References

★Docs
API overview - Claude API Docs
Entry point for available APIs, endpoints, authentication, and availability.
★Docs
Choosing the right model - Claude API Docs
How to pick among Opus, Sonnet, and Haiku based on capability, speed, and cost.
★Docs
Release notes - Claude Platform
Timeline of model launches (e.g., Opus 4.7), features, and platform changes.
★Docs
Pricing - Claude API Docs
Per-model input/output prices, cache pricing, CCU details, and cloud endpoint notes.
★Docs
Rate limits - Claude API Docs
Token-bucket limits, spend tiers, and cache-aware ITPM behavior.

Helpful?

0to1log Weekly

AI Glossary