Learn · AI Engineering
RAG vs fine-tuning vs agents
Three approaches to making LLMs useful for your domain. Each solves a different problem. Most production systems combine two or all three. The question is which to start with and when to layer.
Updated · 2026-05-02 · 10 min read
Approach 01
RAG (Retrieval-Augmented Generation)
Retrieve relevant documents at query time and pass them into the LLM's context window alongside the user's question.
Best for
- Knowledge bases that change frequently (docs, policies, product catalogs)
- When you need citations and source attribution
- When you can't afford to retrain or fine-tune on every update
- Multi-tenant systems where each customer has different data
Tradeoffs
- Retrieval quality is the ceiling. Bad retrieval = bad answers, regardless of model quality.
- Context window limits constrain how much you can retrieve. Chunking strategy matters enormously.
- Latency adds up: embedding + vector search + generation.
- No behavioral change in the model. It still reasons the same way, just with different inputs.
Cost profile: Low to moderate. No training compute. Main costs are embedding, vector DB, and inference.
Approach 02
Fine-tuning
Train the base model on your domain-specific data to change its behavior, style, or domain knowledge permanently.
Best for
- Consistent output style or format (legal briefs, medical notes, code in your framework)
- Domain-specific reasoning patterns the base model doesn't have
- Reducing prompt length by baking instructions into the model
- Classification or extraction tasks with well-defined schemas
Tradeoffs
- Training data quality is everything. Garbage in, confidently wrong garbage out.
- Knowledge cutoff: the model knows what it learned. New information requires retraining.
- Catastrophic forgetting: aggressive fine-tuning can degrade general capabilities.
- Cost: GPU hours for training + ongoing retraining as data evolves.
Cost profile: Moderate to high. Training compute + data curation + evaluation infrastructure.
Approach 03
Agentic AI (tool-using agents)
Give the LLM access to tools (APIs, databases, code execution) and let it plan multi-step actions to accomplish a goal.
Best for
- Complex workflows that require multiple steps and decision branches
- Tasks that need real-time data (APIs, databases, live systems)
- When the answer requires computation, not just text generation
- Processes where the next step depends on the result of the previous step
Tradeoffs
- Reliability: agents can loop, hallucinate tool calls, or take unexpected paths.
- Latency: multi-step execution means multiple LLM calls. 5-step agent = 5x inference cost minimum.
- Debugging is hard. The agent's reasoning trace is your only window into failures.
- Safety: tool access means the agent can take real actions. Guardrails are non-negotiable.
Cost profile: High. Multiple inference calls per task + tool infrastructure + eval harness + guardrails.
Decision framework
Five questions to pick your starting approach.
- 01
Does the knowledge change weekly or faster?
Yes → RAG · No → Consider fine-tuning or agents
- 02
Do you need citations and source attribution?
Yes → RAG · No → Fine-tuning or agents
- 03
Do you need the model to behave differently, not just know more?
Yes → Fine-tuning · No → RAG
- 04
Does the task require multiple steps with real-time data?
Yes → Agents · No → RAG or fine-tuning
- 05
Is reliability more important than capability?
Yes → RAG (simplest) or fine-tuning · No → Agents (most capable)
Combination patterns
In production, you usually combine them.
The most capable production systems layer multiple approaches. A fine-tuned model with RAG retrieval gives you domain-specific reasoning plus up-to-date knowledge. An agent that uses RAG as a tool gets both multi-step capability and grounded answers.
Start with the simplest approach that could work. RAG is usually the right first step: lowest cost, fastest to ship, and easiest to evaluate. Add fine-tuning when you need behavioral change. Add agents when you need multi-step execution.