RAG vs Fine Tuning Generative AI: An Executive Guide

The real issue for most South African executives is not whether a large language model can write fluent answers. It can. The issue is whether it can answer using your organisation’s current, approved, governed information without exposing customer data, inventing policy, or creating operational risk.

That is where the debate around RAG vs fine tuning generative AI matters.

Many boards and executive committees are being asked to approve GenAI pilots with unclear language: “train the model on our data”, “connect the LLM to our documents”, “build an internal knowledge assistant”, or “fine-tune a model for our business”. These phrases are often used loosely. They describe different technical approaches, with different cost, risk, governance and maintenance implications.

This article explains the distinction in business terms for executives evaluating generative AI, LLM and large language model initiatives in Johannesburg, Cape Town and across South Africa.

It is part of Zorinthia’s Generative AI & LLM hub.

The basic executive distinction

A large language model is a general-purpose model that predicts and generates language. On its own, it does not know your latest pricing rules, HR policies, customer contracts, warehouse exceptions, underwriting manuals or board-approved delegations of authority.

There are two broad ways to make an LLM more useful in your business:

  1. Retrieval-augmented generation, or RAG: the model retrieves relevant information from an approved knowledge base at the time a question is asked, then uses that information to draft an answer.
  2. Fine-tuning: the model is further trained so that its behaviour, style or task performance changes based on examples you provide.

In plain English: RAG helps the model look things up. Fine-tuning helps the model behave differently.

For many enterprise use cases, especially internal knowledge, policy interpretation, product support and document-heavy workflows, RAG is often the more practical starting point. Fine-tuning has its place, but it is frequently proposed too early, before the organisation has resolved data quality, ownership, governance and evaluation.

For a wider view of where GenAI fits into enterprise decision-making, see Zorinthia’s AI advisory work.

What RAG actually does

RAG stands for retrieval-augmented generation. It combines search with generative AI.

A typical RAG system has four business components:

  • A knowledge base: approved documents, records, policies, manuals, FAQs, product information, contracts or other sources the organisation wants the system to use.
  • Embeddings: a mathematical representation of text that helps the system understand similarity in meaning, not just exact keyword matches.
  • Vector search: a search method that finds relevant passages based on those embeddings.
  • A prompt to the LLM: instructions that tell the model how to use the retrieved information and how to format the answer.

Consider a Cape Town healthcare group that wants a clinical administration assistant for staff. The assistant should answer questions about appointment protocols, billing codes, internal escalation paths and medical aid administration rules. With RAG, the system does not need to “memorise” every document. Instead, when a staff member asks a question, it searches the approved knowledge base, retrieves the relevant passages, and asks the LLM to produce an answer based on those passages.

This matters because healthcare policies change, medical aid rules are updated, and internal processes are revised. If the knowledge base is maintained properly, the assistant can reflect current information without retraining the model every time a document changes.

What a knowledge base LLM is — and is not

Executives often ask for “a company ChatGPT”. A better description is usually a knowledge base LLM: a large language model interface connected to governed organisational content.

That does not mean the LLM becomes the system of record. Your ERP, CRM, HR platform, document repository and data warehouse remain the formal systems. The LLM sits on top as an interface that helps people find, summarise and interpret information.

This distinction is important in South African organisations where data maturity varies across divisions. A retailer may have accurate product master data but inconsistent store operations documents. A logistics business may have strong fleet telemetry but poorly maintained depot SOPs. A financial services firm may have well-governed customer records but scattered internal policy notes.

If the underlying knowledge base is weak, the AI will expose that weakness faster. RAG does not fix document ownership, contradictory policies or stale content. It makes these problems more visible.

Before approving a RAG initiative, executives should ask: who owns the knowledge base, who approves changes, and which source wins when documents conflict?

That question sits close to AI readiness. If the organisation has not clarified decision rights and data accountability, the technology will not compensate for it. See AI readiness for a broader assessment lens.

RAG vs fine-tuning generative AI

The phrase RAG vs fine tuning generative AI should not be treated as a technical preference. It is a business architecture decision.

RAG is usually suitable when the answer depends on changing organisational knowledge. Examples include:

  • employee policy questions;
  • customer service knowledge articles;
  • product specifications;
  • insurance wording;
  • maintenance manuals;
  • procurement rules;
  • internal operating procedures.

Fine-tuning is more relevant when the organisation needs the model to perform a specialised pattern consistently. Examples include classifying complaint types, extracting fields from a highly standardised document, rewriting content in a regulated tone, or following a specialised format that general prompting cannot achieve reliably.

The trade-off is straightforward:

  • RAG is easier to update because the knowledge base can be changed without retraining the model.
  • Fine-tuning may improve consistency for a narrow task, but it creates a model maintenance burden.
  • RAG gives better source traceability if the system returns references to the documents used.
  • Fine-tuning does not automatically make the model factual about your current business rules.
  • RAG quality depends on retrieval quality; poor search produces weak answers.
  • Fine-tuning quality depends on training examples; poor examples teach the model the wrong behaviour.

A Johannesburg manufacturer, for example, may want maintenance technicians to ask questions about machinery faults. If the goal is to retrieve the latest approved maintenance procedure, RAG is the better fit. If the goal is to classify thousands of fault descriptions into a fixed taxonomy for analytics, fine-tuning or another supervised method may be appropriate.

The decision should follow the business problem, not the enthusiasm of the implementation team.

Prompt engineering is not a governance model

Prompt engineering means writing instructions that guide the model’s output. It can be useful. A prompt can tell the LLM to answer only from retrieved sources, refuse unsupported answers, use plain English, include confidence levels, or escalate uncertain cases.

But prompts are not enough.

A well-written prompt cannot repair missing documents, prevent all hallucinations, override poor access control, or guarantee compliance with POPIA. It is one control among several.

For example, an HR assistant that answers employee questions may process personal information if employees ask about leave, disciplinary processes, medical certificates or benefits. If the system connects to employee records, POPIA obligations become central: purpose limitation, access control, retention, security safeguards and transparency to the data subject all need attention.

The same applies to a CRM-connected sales assistant. Customer names, contact details, purchase history, complaints and credit-related notes are personal information. A generative AI system that retrieves or summarises that data must be governed as part of the organisation’s information processing environment, not treated as an experimental chatbot.

This is why GenAI initiatives need a practical AI governance framework before production deployment.

Evaluation before production deployment

An impressive demo is not evidence that a RAG system is ready for business use.

Executives should require structured evaluation before production deployment. This does not need to be academic, but it must be explicit. The organisation should test the system against real questions, known edge cases and high-risk scenarios.

For a retail group, evaluation might include questions about returns, warranties, promotions, loyalty benefits and store escalation rules. For a bank, it might include product eligibility, complaint handling, fee explanations and vulnerable customer treatment. For a property business, it might include lease clauses, maintenance obligations and tenant communication templates.

The evaluation should measure at least four things:

  • Answer accuracy: is the response correct?
  • Source grounding: does the answer rely on approved retrieved material?
  • Refusal behaviour: does the system say “I do not know” when the knowledge base does not support an answer?
  • Operational usefulness: does the output help the employee or customer complete the task?

This is also where RAG can fail quietly. The model may produce a polished answer based on the wrong retrieved paragraph. Without evaluation, business users may trust fluency instead of correctness.

Monitoring after go-live

Production deployment is not the end of the project. It is the start of operational accountability.

A RAG system should be monitored for usage, failure patterns, unanswered questions, retrieval quality, user feedback, security incidents and content gaps. If a call centre assistant repeatedly fails on a new product query, that may indicate a knowledge base issue rather than an LLM issue. If employees keep asking questions outside the approved scope, the organisation may need clearer boundaries.

South African operating conditions also matter. Load-shedding, connectivity interruptions and branch-level infrastructure constraints can affect system availability. A warehouse in Ekurhuleni or a regional clinic in the Eastern Cape may not experience the same reliability as a head office in Sandton. If the AI assistant becomes embedded in daily operations, business continuity planning must include it.

Monitoring should also include governance triggers. For instance: when should the system be paused, who can approve a new data source, and what happens if an answer creates customer harm?

The executive buying questions

When a vendor, internal team or consultant proposes a GenAI initiative, executives do not need to inspect code. They do need to ask sharper questions.

Start with these:

  1. Is this a retrieval problem, a behaviour problem, or both?
    If the main issue is access to current company knowledge, RAG is likely central. If the issue is consistent classification or formatting, fine-tuning may be relevant.

  2. What information will the system retrieve?
    Identify the knowledge base, document owners, update process and excluded content.

  3. Will personal information be processed?
    If employees, customers, patients, tenants or CRM records are involved, POPIA must be addressed before go-live.

  4. How will we test the system?
    Ask for an evaluation plan, not just a demonstration.

  5. What happens when it is wrong?
    Define escalation, audit logs, human review and stop conditions.

  6. Who owns it after launch?
    A RAG system needs business ownership, not only technical support.

These questions help separate a useful AI capability from a polished prototype.

For organisations moving from exploration into implementation, independent support is often useful at the design, governance and evaluation stages. Zorinthia’s AI consulting work is designed around those executive decision points.

The next decision

RAG is not a magic layer that makes corporate knowledge trustworthy. Fine-tuning is not a shortcut to business understanding. Both can be valuable, but only when matched to the right problem and governed properly.

The next executive question should be simple:

Which business decision or workflow are we trying to improve, and what trusted information must the AI use to support it?

If that cannot be answered clearly, the organisation is not yet choosing between RAG and fine-tuning. It is still defining the problem.