LiteLLM

One OpenAI-style API for 100+ LLMs with spend controls

Freemium Some setup needed API · Web

platform workflow #llm#api-gateway#observability#rate-limiting#governance

About

Route all model calls through one OpenAI-compatible endpoint and switch providers without rewriting prompts. Platform teams use it to give developers access to OpenAI, Azure, Bedrock, Anthropic, and Gemini with fallbacks, budgets, and rate limits. Open source with 240M+ Docker pulls and used across 1B+ requests.

Editor's Take

We recommend LiteLLM if your team needs one OpenAI-compatible gateway to manage multiple provider keys, fallbacks, and per-team budgets; be prepared to operate Redis and logging infrastructure for production-scale use.

Key Features

Connect OpenAI, Azure, Bedrock, Anthropic, and Gemini keys → route calls through one OpenAI-compatible API with automatic LLM fallbacks
Set budgets and RPM/TPM limits per key/team → prevent overruns and track spend by user/team/org with logs to S3/GCS
Swap models across 100+ providers → keep the same request schema without prompt reformatting
Deploy the Gateway via Docker → run an image with 240M+ pulls and community support from 1,005+ contributors
Enable Langfuse, Langsmith, or OTEL logging → get request traces and cost attribution for every call

Use Cases

A platform engineer centralizing LLM access for 20+ internal apps with rate limits and fallbacks
A FinOps lead attributing AI spend to teams and enforcing monthly budgets across OpenAI and Bedrock
A backend developer swapping from GPT-4 to Claude 3.5 in staging without changing request code

Try It Like This

1
Centralize LLM calls behind one API
Create a LiteLLM gateway container → configure provider keys for OpenAI, Azure, Bedrock, Anthropic, and Gemini in the gateway → point internal apps at the OpenAI-compatible endpoint and verify fallbacks and key-specific rate limits.
2
Enforce team budgets and rate limits
Enable per-key/team RPM and TPM limits in the gateway config → attach budgets and connect usage logs to S3/GCS → monitor spend and throttle requests automatically when limits hit.
3
Swap models without changing prompts
Update the gateway routing to prefer Claude 3.5 or Gemini for a staging key → keep the same OpenAI-style request schema from your app → run integration tests to confirm responses match expectations without prompt rewrites.
4
Enable request tracing and cost attribution
Enable Langfuse, Langsmith, or OTEL integration in the gateway → route logs and traces to your observability stack and S3/GCS → query traces to attribute cost and latency to specific teams or endpoints.
5
Quick local dev with Python SDK
pip install litellm in your local environment → point the SDK at a provider key or local gateway → make an OpenAI-style request to map provider-specific schemas during early development.

Pros & Cons

Pros

Routes OpenAI, Azure, Bedrock, Anthropic, and Gemini through a single OpenAI-compatible API so apps don't need prompt reformatting.
Supports per-key/team budgets and RPM/TPM limits plus logs to S3/GCS for spend tracking and governance.
Deployable via Docker with a large community footprint (240M+ pulls, 1,005+ contributors) and reported high throughput (~350 RPS on 1 vCPU, ~10ms latency under load).

Cons

The gateway introduces state and operational components (Redis, logging DB) that add complexity and hidden infrastructure costs compared with using only an SDK.
Running a high-throughput proxy can add serialization overhead and potential latency compared to direct provider calls in some setups.
Some features or lower-total-cost claims may require commercial licensing—check LiteLLM’s enterprise terms to understand additional costs.

Getting Started

1 Deploy the LiteLLM Gateway from the docs (Docker) or install the Python SDK via pip
2 Configure provider API keys and set budgets/rate limits; enable logging to Langfuse or OTEL
3 Send a POST to /v1/chat/completions and see a model response with spend recorded

Pricing

Plan	Price	Includes
Free	$0	Open Source; 100+ LLM Provider Integrations; Virtual Keys; Budgets; Teams; Load Balancing; RPM/TPM limits; LLM Guardrails
Enterprise	Get In Touch	Everything in OSS; Enterprise Support + Custom SLAs; JWT Auth, SSO, Audit Logs; All Enterprise Features - Docs