Vol.01 · No.10 CS · AI · Infra May 30, 2026

AI Glossary

GlossaryReferenceLearn
Products & Platforms LLM & Generative AI Deep Learning

Claude

Difficulty

Plain Explanation

Claude lets teams add high‑quality language and coding capabilities to apps via an HTTP API instead of hosting models themselves. You call the model with the Messages API for turn‑by‑turn control, use prompt caching to reuse large repeated context at lower cost/latency, and govern usage with token‑based pricing, token‑bucket rate limits, and monthly spend limits.

Examples & Analogies

  • Regulatory report summarization: cache the shared instructions and structure, then run nightly batches across thousands of documents at lower cost.
  • Long‑horizon refactors: use Managed Agents sessions to carry multi‑step changes with event history.
  • Internal Q&A bot: persist large policy documents via cache writes and send only queries at request time.

At a Glance

Claude API (first‑party)Amazon BedrockVertex AI
BillingToken‑based (input/output, cache pricing)CCU‑basedGoogle Cloud billing
EndpointsGlobal by defaultGlobal/RegionalGlobal/Multi‑region/Regional
Model accessPer Claude docsPer Bedrock catalogPer Vertex catalog
Rate limitsToken bucket (RPM/ITPM/OTPM)Similar semantics; platform quotasPlatform quotas/limits

Where and Why It Matters

  • Standardized model selection: choose among Opus/Sonnet/Haiku to balance capability, latency, and cost, and up/down‑shift as needed.
  • Prompt caching benefit: cache reads are priced separately and, for most models, excluded from ITPM—boosting effective throughput.
  • Spend/limit governance: workspace limits and tiers keep monthly exposure bounded.
  • Batch vs interactive: use /v1/messages for interactive paths and /v1/messages/batches for large, non‑urgent workloads.

Common Misconceptions

  • ❌ “Claude is a single model.” → ✅ It’s a family (e.g., Opus/Sonnet/Haiku) with distinct trade‑offs.
  • ❌ “All input tokens count the same for limits.” → ✅ For most models, cache reads don’t count toward ITPM.
  • ❌ “You can only use the first‑party API.” → ✅ Amazon Bedrock and Vertex AI also host Claude with different billing/endpoint policies.

How It Sounds in Conversation

  • "Put the system prompt in cache and preflight with count_tokens before sending."
  • "On 429s, honor retry‑after and tune concurrency by RPM/ITPM/OTPM."
  • "Keep interactive on Messages and push backfills to Batches to smooth demand."

Related Reading

References

Helpful?