Fine-tuning
Plain Explanation
Fine-tuning means training an already pretrained model further for a narrower goal. If the base model learned broad language, code, and world knowledge, fine-tuning teaches it a company style, a legal classification scheme, or a consistent support-answer format. It is usually more about stabilizing behavior than simply adding facts.
Examples & Analogies
- Employee onboarding: a capable person learns company process and tone.
- Support replies: the model learns to answer policy questions in a fixed format.
- Code style: the model adapts to a repository's naming, formatting, and review conventions.
At a Glance
| Method | Main problem solved | Strength | Watch out |
|---|---|---|---|
| Prompting | guide behavior with instructions | fast and cheap | consistency limits |
| RAG | retrieve external knowledge | current/private knowledge | depends on retrieval quality |
| Fine-tuning | learn repeated behavior or format | consistency, domain adaptation | needs clean data and evals |
| LoRA/PEFT | train small adapters | lower cost | constrained by the base model |
Where and Why It Matters
Fine-tuning is useful when the repeated behavior matters more than one-off knowledge. It can help with stable JSON output, a specific label taxonomy, or a company writing style. If the goal is to inject fresh documents or private facts, RAG is often a better first choice.
Common Misconceptions
- “Fine-tuning is how you add knowledge” → RAG may be better for current or private facts.
- “More data is always better” → low-quality examples teach bad behavior.
- “Low training loss means success” → the real test is performance on a separate eval set.
- “Small data always means low cost” → cleaning, evals, and reruns may dominate cost.
How It Sounds in Conversation
- “This is not a knowledge retrieval issue; it is an output consistency issue, so fine-tuning is a candidate.”
- “Let's build a base model plus prompt plus RAG baseline first, then tune only what evals show is failing.”
- “If train and test examples overlap, the improvement is probably inflated.”
- “Start with LoRA, then consider full fine-tuning if the adapter is not enough.”
Related Reading
References
- Supervised fine-tuning
Official API docs covering SFT workflow, datasets, training jobs, and eval-first practice.
- Fine-tuning best practices
Best-practice docs for train/test sets, prompt consistency, and evaluation considerations.
- Fine-tuning
Explains continuing training from pretrained models on task or domain datasets.
- PEFT
Documents parameter-efficient fine-tuning methods such as LoRA and adapter-style training.
- SFTTrainer
Practical reference for LLM supervised fine-tuning loops and dataset formats.