Anthropic API
Plain Explanation
The Anthropic API lets you use Claude by sending a message and receiving a response, without hosting models yourself. The service meters usage by tokens and applies spend tiers and rate limits so you can scale predictably. When your prompts repeat large blocks (instructions, shared docs), prompt caching can reuse them to cut latency and cost; on supported models, cached reads may not count toward ITPM, improving effective throughput.
Examples & Analogies
- Internal knowledge Q&A: write a long manual to cache once, then ask short questions against it.
- Weekly report drafting: cache shared system instructions to reduce input cost and spikes.
- Data residency rollout: prototype on first‑party API, deploy on partner clouds with regional endpoints.
At a Glance
- First‑party API: direct control of token pricing, limits, and caching effects on a single surface
- Partner clouds: regional/multi‑region endpoints and provider billing/governance
- Common: access to Claude models, pinned snapshot IDs, documented pricing and limits
Where and Why It Matters
- Cache‑aware operations lower input cost and help ITPM headroom
- Long‑context tasks like document analysis and summarization become practical
- Spend tiers and token‑bucket limits enable predictable scaling and 429 handling
- Multiple access surfaces allow governance and procurement fit
Common Misconceptions
- "Cached reads always count toward ITPM" → on supported models, cached reads may be excluded.
- "Claude is only on the first‑party API" → also available via partner cloud surfaces.
- "Only input/output unit prices matter" → cache write/read rates and surface‑specific policies also apply.
How It Sounds in Conversation
- "Pin the system prompt in cache to improve ITPM headroom and latency."
- "Honor retry-after and cap concurrency to smooth out 429s during spikes."
- "Separate first‑party vs partner‑cloud pricing and limits in our cost pipeline."
Related Reading
References
- Messages API - Claude API Docs
Official request/response contract for calling Claude models from applications.
- Models overview - Claude API Docs
Current model IDs, aliases, context windows, output limits, and cloud availability.
- Rate limits - Claude API Docs
RPM/ITPM/OTPM limits, retry-after behavior, and cache-aware ITPM accounting.
- Pricing - Claude API Docs
Model pricing, prompt caching rates, batch discounts, and cloud pricing notes.
- Prompt caching - Claude API Docs
How cache breakpoints and cache usage fields work in Claude API requests.