PyTorch

Define models in Python, deploy with TorchServe and TorchScript

Technical setup Web · iOS · Android · Desktop · API

coding research #deep-learning#model-training#distributed-training

About

Write models as plain Python, train on GPUs, and export to TorchScript for production. Researchers and ML engineers use it to prototype with dynamic graphs, run distributed training, and serve models over REST with TorchServe. Its built-in TorchScript and torch.distributed keep the path from notebook to multi-node training and deployment in one framework.

Editor's Take

We recommend PyTorch for researchers and ML engineers who need full control over model code, custom training loops, and multi-GPU scaling; it's worth trying if you plan to move a prototype from notebook to production without rewriting model logic.

Key Features

Export a trained model with TorchScript → run the same model in production with graph optimizations
Package a model in TorchServe → get scalable REST endpoints for inference without rewriting your model code
Spin up torch.distributed across multiple GPUs/nodes → scale training while keeping your training loop intact
Pick CUDA 11.8/12.6/12.8, ROCm 6.3, or CPU at install → match your hardware with optimized kernels
Prototype on desktop and target iOS/Android (experimental) → validate on-device inference builds early

Use Cases

An ML researcher iterating on a new attention variant and training across 8 GPUs with torch.distributed
A machine learning engineer serving an image classifier behind a REST API using TorchServe
A mobile engineer prototyping on-device inference from a TorchScript model on Android (experimental)

Try It Like This

1
Quick model prototype in Python
Create a new Python file and define a nn.Module subclass for your model → write a training loop that uses autograd and torch.optim to run a few epochs on CPU or a single GPU → inspect tensors and gradients interactively to iterate on the architecture and loss until validation improves.
2
Scale training across 8 GPUs
Wrap your training script with torch.distributed (launch or torchrun) and choose a backend (NCCL for NVIDIA GPUs) → keep your existing training loop and add DistributedDataSampler plus per-process device placement → monitor per-rank logs and validate that loss/metrics match single-GPU runs as you scale.
3
Export model for production with TorchScript
After finishing training, trace or script the nn.Module using torch.jit.trace / torch.jit.script to produce a TorchScript artifact → run the script module with example inputs to verify numerical parity and apply graph optimizations → package the saved model for deployment or embedding in a mobile app.
4
Serve a model behind a REST endpoint
Package your trained and exported model into a TorchServe model archive (MAR) with required handlers and config → start TorchServe and register the model to expose a scalable REST inference endpoint → test latency and throughput with sample requests and adjust workers/Batching settings for production traffic.
5
Prototype on-device inference (Android/iOS experimental)
Export the model to TorchScript and convert to a mobile-compatible bundle → integrate the bundled model into the Android or iOS app using the PyTorch mobile runtime → run inference on-device to validate latency and numeric results, noting that tooling and APIs are marked experimental.

Pros & Cons

Pros

Pythonic API with dynamic computation graphs makes prototyping and debugging models straightforward and intuitive.
Built-in TorchScript lets you export the exact trained model for production with graph optimizations so you can run the same model outside Python.
First-class support for distributed training (torch.distributed) and CUDA/ROCm selection enables scaling across multiple GPUs and nodes.

Cons

Mobile inference support for iOS and Android is listed as experimental, so on-device tooling and stability can be limited.
Production serving requires additional components (e.g., packaging for TorchServe) and some deployment engineering beyond the core library.

Getting Started

1 Visit pytorch.org/get-started, choose your OS, package, and compute platform (CUDA/ROCm/CPU), then run the provided pip install command (e.g., pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118).
2 Open Python and verify the install and GPU with: import torch; print(torch.__version__, torch.cuda.is_available()).
3 Define a small nn.Module, script it with torch.jit.trace or torch.jit.script, and save it — you now have a production-ready TorchScript artifact.