quit your lying — Can a technology called RAG keep AI models from making stuff up? The framework pulls in external sources to enhance accuracy. Does it live up to the hype?

Chris Stokel-Walker – Jun 6, 2024 11:00 am UTC EnlargeAurich Lawson | Getty Images reader comments 152

Weve been living through the generative AI boom for nearly a year and a half now, following the late 2022 release of OpenAIs ChatGPT. But despite transformative effects on companies share prices, generative AI tools powered by large language models (LLMs) still have major drawbacks that have kept them from being as useful as many would like them to be. Retrieval augmented generation, or RAG, aims to fix some of those drawbacks. Further ReadingWhy ChatGPT and Bing Chat are so good at making things up

Perhaps the most prominent drawback of LLMs is their tendency toward confabulation (also called hallucination), which is a statistical gap-filling phenomenon AI language models produce when they are tasked with reproducing knowledge that wasnt present in the training data. They generate plausible-sounding text that can veer toward accuracy when the training data is solid but otherwise may just be completely made up.

Relying on confabulating AI models gets people and companies in trouble, as weve covered in the past. In 2023, we saw two instances of lawyers citing legal cases, confabulated by AI, that didnt exist. Weve covered claims against OpenAI in which ChatGPT confabulated and accused innocent people of doing terrible things. In February, we wrote about Air Canadas customer service chatbot inventing a refund policy, and in March, a New York City chatbot was caught confabulating city regulations.

So if generative AI aims to be the technology that propels humanity into the future, someone needs to iron out the confabulation kinks along the way. Thats where RAG comes in. Its proponents hope the technique will help turn generative AI technology into reliable assistants that can supercharge productivity without requiring a human to double-check or second-guess the answers. Advertisement

RAG is a way of improving LLM performance, in essence by blending the LLM process with a web search or other document look-up process to help LLMs stick to the facts, according to Noah Giansiracusa, associate professor of mathematics at Bentley University.

Let’s take a closer look at how it works and what its limitations are. A framework for enhancing AI accuracy

Although RAG is now seen as a technique to help fix issues with generative AI, it actually predates ChatGPT. Researchers coined the term in a 2020 academic paper by researchers at Facebook AI Research (FAIR, now Meta AI Research), University College London, and New York University.

As we’ve mentioned, LLMs struggle with facts. Googles entry into the generative AI race, Bard, made an embarrassing error on its first public demonstration back in February 2023 about the James Webb Space Telescope. The error wiped around $100 billion off the value of parent company Alphabet. LLMs produce the most statistically likely response based on their training data and dont understand anything they output, meaning they can present false information that seems accurate if you don’t have expert knowledge on a subject.

LLMs also lack up-to-date knowledge and the ability to identify gaps in their knowledge. When a human tries to answer a question, they can rely on their memory and come up with a response on the fly, or they could do something like Google it or peruse Wikipedia and then try to piece an answer together from what they find therestill filtering that info through their internal knowledge of the matter, said Giansiracusa.

But LLMs arent humans, of course. Their training data can age quickly, particularly in more time-sensitive queries. In addition, the LLM often cant distinguish specific sources of its knowledge, as all its training data is blended together into a kind of soup.

In theory, RAG should make keeping AI models up to date far cheaper and easier. The beauty of RAG is that when new information becomes available, rather than having to retrain the model, all thats needed is to augment the models external knowledge base with the updated information, said Peterson. This reduces LLM development time and cost while enhancing the models scalability. Page: 1 2 3 Next → reader comments 152 Advertisement Promoted Comments Harvesterify A very recent research paper explored the hypothesis that RAG would reduce hallucinations and improve recall, when applied to legal texts and legal-related tasks (summarizing caselaws, document drafting, etc), and the conclusion is negative, specialized models hallucinate between 17 and 33% (which is a slight improvement over general purposes models, but not much), while slightly improving recall.

Paper is the following one: "Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools", from Varun Magesh, Faiz Surani, Matthew Dahl, Mirac Suzgun, Christopher D. Manning, Daniel E. Ho June 6, 2024 at 12:38 pm Channel Ars Technica ← Previous story Next story → Related Stories Today on Ars