Learn · AI Engineering

    RAG vs fine-tuning vs agents

    Three approaches to making LLMs useful for your domain. Each solves a different problem. Most production systems combine two or all three. The question is which to start with and when to layer.

    Updated · 2026-05-02 · 10 min read

    Approach 01

    RAG (Retrieval-Augmented Generation)

    Retrieve relevant documents at query time and pass them into the LLM's context window alongside the user's question.

    Best for

    • Knowledge bases that change frequently (docs, policies, product catalogs)
    • When you need citations and source attribution
    • When you can't afford to retrain or fine-tune on every update
    • Multi-tenant systems where each customer has different data

    Tradeoffs

    • Retrieval quality is the ceiling. Bad retrieval = bad answers, regardless of model quality.
    • Context window limits constrain how much you can retrieve. Chunking strategy matters enormously.
    • Latency adds up: embedding + vector search + generation.
    • No behavioral change in the model. It still reasons the same way, just with different inputs.

    Cost profile: Low to moderate. No training compute. Main costs are embedding, vector DB, and inference.

    Approach 02

    Fine-tuning

    Train the base model on your domain-specific data to change its behavior, style, or domain knowledge permanently.

    Best for

    • Consistent output style or format (legal briefs, medical notes, code in your framework)
    • Domain-specific reasoning patterns the base model doesn't have
    • Reducing prompt length by baking instructions into the model
    • Classification or extraction tasks with well-defined schemas

    Tradeoffs

    • Training data quality is everything. Garbage in, confidently wrong garbage out.
    • Knowledge cutoff: the model knows what it learned. New information requires retraining.
    • Catastrophic forgetting: aggressive fine-tuning can degrade general capabilities.
    • Cost: GPU hours for training + ongoing retraining as data evolves.

    Cost profile: Moderate to high. Training compute + data curation + evaluation infrastructure.

    Approach 03

    Agentic AI (tool-using agents)

    Give the LLM access to tools (APIs, databases, code execution) and let it plan multi-step actions to accomplish a goal.

    Best for

    • Complex workflows that require multiple steps and decision branches
    • Tasks that need real-time data (APIs, databases, live systems)
    • When the answer requires computation, not just text generation
    • Processes where the next step depends on the result of the previous step

    Tradeoffs

    • Reliability: agents can loop, hallucinate tool calls, or take unexpected paths.
    • Latency: multi-step execution means multiple LLM calls. 5-step agent = 5x inference cost minimum.
    • Debugging is hard. The agent's reasoning trace is your only window into failures.
    • Safety: tool access means the agent can take real actions. Guardrails are non-negotiable.

    Cost profile: High. Multiple inference calls per task + tool infrastructure + eval harness + guardrails.

    Decision framework

    Five questions to pick your starting approach.

    1. 01

      Does the knowledge change weekly or faster?

      Yes → RAG · No → Consider fine-tuning or agents

    2. 02

      Do you need citations and source attribution?

      Yes → RAG · No → Fine-tuning or agents

    3. 03

      Do you need the model to behave differently, not just know more?

      Yes → Fine-tuning · No → RAG

    4. 04

      Does the task require multiple steps with real-time data?

      Yes → Agents · No → RAG or fine-tuning

    5. 05

      Is reliability more important than capability?

      Yes → RAG (simplest) or fine-tuning · No → Agents (most capable)

    Combination patterns

    In production, you usually combine them.

    The most capable production systems layer multiple approaches. A fine-tuned model with RAG retrieval gives you domain-specific reasoning plus up-to-date knowledge. An agent that uses RAG as a tool gets both multi-step capability and grounded answers.

    Start with the simplest approach that could work. RAG is usually the right first step: lowest cost, fastest to ship, and easiest to evaluate. Add fine-tuning when you need behavioral change. Add agents when you need multi-step execution.

    Next step

    Not sure which approach fits your use case?

    A 30-minute discovery call. We'll map your data, constraints, and goals to the right architecture.