LumoMate
LumoMate/Glossary/IntelligenceAI / ML

RAG

Retrieval-Augmented Generation: read, then write.
Editorial illustration representing RAG: Retrieval-Augmented Generation: read, then write.

RAG patches the limits of a language model by giving it a library card. Before answering, the system looks something up and writes its reply on top of what it found.

In plain language

A plain language model answers from memory. That memory is huge, but it was frozen on the day the model finished training, and it has no idea what's in your company's wiki or last week's report. So if you ask about something fresh or private, it tends to guess — confidently.

RAG fixes that with a small extra step. Before the model writes anything, the system goes and reads. It searches a pile of documents you choose — help articles, PDFs, a Notion workspace, a database — pulls back the passages that look most relevant, and hands them to the model along with your question. The model then writes its answer using those passages as the source of truth, instead of leaning only on what it happens to remember.

That is really the whole trick: retrieve first, write second. Most "chatbot over our docs" products you've ever used are some version of this.

Inline editorial illustration of an open notebook with an index card being slid between two pages, suggesting retrieval before writing.
FIG. 1Retrieval first, writing second. The model never sees your library — it sees the page you handed it.

An everyday picture

Think of an open-book exam. The student isn't smarter than yesterday — they just get to peek at the textbook while answering. Their job changes from "remember everything" to "find the right page and explain it in your own words."

A language model without RAG is a closed-book exam. A language model with RAG is the same student, with the textbook open to the right page.

Where it shows up

You'll see RAG quietly powering a lot of what looks like "AI that knows our stuff." Customer-support chatbots that cite a help center article. Internal Q&A bots that read a team's Notion or Confluence. Coding assistants that look at the open file before suggesting a change. Legal and medical assistants that have to point back to an actual source instead of guessing.

Anywhere the answer needs to be true *for this specific organisation, at this specific moment*, RAG is usually somewhere in the pipeline.

A small example

An employee asks an internal HR bot, "how many paid days off do I have left?" Without RAG, the model has never seen this company's policy, so it makes up a plausible-sounding number.

With RAG, the system first searches HR documents and pulls back two things: the company's leave policy page, and the employee's own usage record from last quarter. Both are quietly tucked into the prompt. The model then writes a short, specific answer — "12 days, based on your hire date and the 4 days you used in April" — using those snippets, not its imagination.

Common misunderstanding

MYTH
People often say the model "reads the documents." It doesn't. The retrieval step reads, copies the relevant text into the prompt, and the model only ever sees that prompt. If retrieval misses the right passage, the model can't recover.

One line to take with you

Retrieval comes first. Writing comes second. Fix the retrieval and the writing gets easier on its own.
Monday 08:00 — every week

One letter a week,
lasting understanding.

Only essays that don't get scrolled past. No ads, no tracking pixels, no external linkbait — the letter ends inside your inbox.

One-click unsubscribe. No spam.