LangSmith

Trace, evaluate, and deploy LLM apps across any framework

Some setup needed Web · API · Desktop

About

Instrument your app to capture traces, run evals, and manage deployments from one console. Teams building agents and LLM features use it to debug prompts, compare outputs, and monitor production runs. Works with or without LangChain via manual instrumentation and API integrations.

Editor's Take

Worth trying if your team needs unified tracing, evals, and deployment controls for LLM features; best suited for projects where observability and pre-deploy comparisons matter and the team can add minor instrumentation.

Key Features

Use the LangChain integration or add manual hooks → capture the same tracing, evaluation, and deployment data across frameworks
Select a Tracing Project → view structured request and response traces for each run
Open Dashboards → track runs and evaluation results in one place
Edit a prompt and rerun tests → compare outputs across runs before deploying
Create an API key and point your app to LangSmith → manage deployments from the console

Use Cases

A platform engineer monitoring an LLM agent in production and triaging failures after a spike in errors
A prompt engineer iterating on prompts and running evaluations before a feature launch
A data scientist comparing model outputs across versions for a customer support assistant

Try It Like This

1
Trace a failing agent run
Create a Tracing Project in LangSmith → add LangChain integration or manual hooks to capture request/response pairs → open the trace for the failing run to inspect structured inputs, intermediate steps, and final outputs to diagnose the error.
2
Iterate a prompt with A/B tests
Create two prompt variants and save them as separate runs → run evaluations (evals) against the same test dataset and scoring logic → compare evaluation metrics and side-by-side outputs in the dashboard before choosing the better prompt.
3
Compare model outputs across versions
Instrument your app to tag runs with model version metadata → run the same input dataset against both model versions via LangSmith tracing → use the evaluation and comparison views to quantify differences and regressions.
4
Deploy and manage a model endpoint
Create an API key in LangSmith and register your deployment configuration → push the deployment from the console (or point your app to the deployment endpoint) → monitor live runs, logs, and evaluation results to verify behavior after rollout.
5
Run batch evaluations for a release check
Assemble a test dataset and scoring logic (custom or built-in) in an eval project → trigger batch evals against the candidate prompt/model → review aggregate metrics and failure cases in Dashboards to decide go/no-go for release.

Pros & Cons

Pros

Captures tracing, evaluation, and deployment data in one console so teams can view structured request and response traces for each run.
Works with or without LangChain: offers automatic tracing via LangChain integration and the same observability features through manual instrumentation.
Includes dashboards and rerun/edit workflows to compare outputs and evaluation results before deploying changes.

Cons

Manual instrumentation requires more code than LangChain's automatic tracing, which can raise integration effort for non-LangChain apps.

Getting Started

1 Sign up at smith.langchain.com and create an API key in Settings.
2 Install the integration for your framework or add manual instrumentation, then set the API key.
3 Run your app and open your Tracing Project to see captured runs and basic evals within minutes.