8 common problems and solutions for RAG system implementation

Regarding what RAG is, I have zero-level science popularization, you can click here to check it.

This article focuses on the specific problems and solutions of the landing process.

There are 8 frequently asked questions:

Don’t answer according to the answer of the “knowledge base”, make your own opinions
The same questions cannot guarantee accurate answers every time, and even many times they are wrong
The answer is incomplete, but there is clearly a complete answer in the “knowledge base”
Correct and incorrect answers coexist
Fan Fan replied, there are no details / everything is comprehensive, everything is detailed
Answers do not show pictures
The response is very slow, even crashing directly
Answering questions one-sidedly, it is not possible to correctly “select” appropriate support materials

At the very beginning

To build a RAG product/system, 90% of the effort should be spent on knowledge base data processing.

“It is difficult for a clever woman to cook without rice”.

In the RAG system, AI large models only play a role in the last step. This “clever woman” could not make “rotten leaves” into a satisfying dinner.

Of the previous 7 common problems, only 1 and 5 were handled by the “clever woman”, and the other 5 were because we fed her rotten leaves.

1. Answer questions on your own

The root cause of this problem is not understanding the essence of RAG.

In the RAG system, the large model is only responsible for two things:

1. Judge whether this question can be answered

2. Answer The reason why editors have “self-opinion” is that most of the time there is no clear responsibility of large models

Since we have chosen RAG, it means that we no longer believe in the answers generated by large models

If you don’t believe it, stick to it to the end: don’t entrust it with important responsibilities while distrusting it.

In the RAG system, the standard description structure for roles and tasks:

Role description: A role without subjective initiative, such as assistant task scenario: receive questions and support materials, edit output text

Any attempt to restrain the large model with the following “prayer-style” prompts is a layman’s performance:

1. Don’t make it up

2. Don’t generate answers that don’t exist

3. Make sure your answers are accurate (respect the facts)

4. ……

2. The answer is unstable

There are two core reasons:

1. There is a problem with the question itself, and the correct answer cannot be retrieved stably

2. There is a problem with the information and sorting, and the correct answer cannot be recalled stably

Yes, the problem with “user problem” is not the user’s problem, it is your problem.

We cannot require our users to be professional users, because professional users will most likely not use our products……

Most users do not start from “knowledge information” and use complete and comprehensive descriptions when asking questions like we do.

Most of the time, their questions will look something like this:

1. Is there any product description?

2. The login button is unresponsive

3. Can Apple do it?

It’s not unreasonable for LightRAG to go crazy with 17K Star for query optimization, and it’s highly recommended to take a look at their engineering optimizations.

Assuming you have clarified the responsibilities of the large model (editing), then the key factor that affects its answer is what reference materials we give it.

What support materials to choose for the large model depends on two factors:

1. Whether the information that can answer the user’s question can be retrieved

2. Whether the information is retrieved in the first place

Because of the myriad forms of data, there is currently no solution that can guarantee that the data is 100% retrieved.

However, you can use the following three schemes to optimize:

1. Effectively preprocess the data to ensure that the segmentation is reasonable (at least the complete answer is not shredded)

2. Secondary processing of the data, such as extracting keywords + extracting possible problems

3. Choose a high-latitude embedding model to enhance semantic recognition Regarding the answer ranking problem (factor 2), blindly spending money on expensive Rerank is not the only solution.

Spend more effort adjusting the mix weight, Score threshold, and take a closer look at the characteristics of the recalled answers to get more effective.

3. The answer is incomplete

There is basically only one reason for this problem: the documentation is unreasonable, and the complete answer is cut off.

And only one section was recalled.

There is only one solution, go to the recall paragraph and re-segment.

Don’t be lazy with custom segmentation, or simply and rudely choose segmentation by number of characters.

Another detail needs to be noted: the “By identifier” section and the “Maximum Segment Length” in the knowledge base segment are effective at the same time.

More notably,Those incomplete answers are the main source of the hallucination of large models!

4 Contains incorrect answers

Two reasons:

1. The inclusion of irrelevant answers in the recall paragraph led to the hallucination of the large model

2. There are not enough prompts for large models to generate answers

The higher the top K in the recall strategy, the more paragraphs recalled, the more irrelevant answers there are.

If you don’t have a proper solution in this link, you can only work hard in the final “bottom-up” prompt: tell the big model how to judge which answers are valid and how to eliminate irrelevant paragraphs.

Normally, I would add this sentence to the prompt that generates the final answer:

Examine the relevance of support materials to user questions, some materials may be put in the wrong place, but they do not answer user questions, and you can choose not to access them.

5 No details/all details

This is a response formatting question, which is the same as “I can’t get the RAG system to generate answers in the specified format”.

Essentially, you didn’t explain it clearly in the last prompt, orIt has expired。

There are only two solutions to this problem:

1. Give examples of responses instead of describing requirements

2. Put the constraint at the end of the prompt, the user prompt is worth replaying

6. There are no pictures

All knowledge should be converted to Markdown before being segmented.

Formats such as Word and PDF are for people to see.

What you see and what the large model will eventually receive will be completely different faces.

Especially picture-related content

Make sure you have a solid understanding of the fundamentals of RAG before thinking about why not display images, otherwise the solutions below won’t really solve the problem.

Tell the large model to display the Markdown or <img>marked images correctly, and it is best to add annotations to the images to facilitate model selection.

7. Slow response

In addition to being affected by the quality of the model itself, the context length is another very important factor.

Even if you are worried about token fees, it is important to consider the response speed and segment the support materials.

2000 tokens can be used as the upper limit of segment length, and the response time of the first token will exceed 1 second, and if streaming output is not available, the overall response time may exceed 10 seconds.

By the way, ask the mature engineers on the team to give the API response timeout……

8 Not systematic

The biggest problem caused by segmentation is the fragmentation of knowledge.

The impact of “fragmented knowledge” on RAG systems is mainly due to the comprehensiveness of recall answers, which cannot be directly answered but are rarely retrieved as background information or related information.

There are two popular solutions at present:

1. Knowledge graph enhancement

2. Agentic enhancements

Personally, I am more optimistic about the strategy of knowledge graph enhancement at this moment, and Agentic will involve more engineering optimization and prompt discipline (mainly the domestic model Agentic is not available at the moment).

Microsoft’s GraphRAG project is recommended.

Come to the AI learning circle to improve

I have been running the “AI Learning Action Circle” with the starting point classroom since ’23, and as of now, I have updated 1500+ topics, and discussed and exchanged practical AI applications with nearly 4000 practitioners who pay attention to AI every day for the past 490+ days.

The study circle currently has three core learning and exchange “positions”:

1. Knowledge Planet: The core channel for the precipitation of knowledge and materials, which can be consulted at any time

2. WeChat communication group: Currently, there are 6 groups, and friends exchange and share their AI usage experience every day

3. Blowing Water Bureau Live: 19:30-21:30 on weekday evenings, one AI application theme per session

Position 1: Knowledge Planet

I mainly maintain three tags on the planet: “Combat Sharing”, “Toolbox” and “Intelligence Agency”

Practical sharing is a prompt word and efficiency tool that can be directly applied in daily work and life. The Step-Back prompt in the screenshot above is very useful, comparable to o4. The prompts for all AI practical applications demonstrated on the official account and live broadcast are also under this tab.

AI tools and rare known, popular AI tools, information sharing, I have filtered out those that are too technical and too exaggerated, and put them in this label are all fun that can be used directly!

The planet also has a “column” system, and its current positioning is similar to that of the label.

If you are looking for a place where you can learn about the latest and practical AI information and practical skills for the first time, you can find a circle where you can communicate, consult, and discuss with peers at any time when encountering any AI application problems.

Position 2: WeChat communication group

We have equipped a WeChat communication group for our circle friends, and now the 6 groups are almost full.

In the WeChat group, there is an AI morning newspaper every morning, and there is also a “newspaper reading time” in the morning and afternoon, as well as my experience of various AI tools at the level of swiping every day, prompt arrangement and thinking, and industry news interpretation synchronization.

Also, you can discuss any AI-related tools and applications in the group, and you can almost find answers.

Scan the code now to receive a 50 yuan instant discount

Position 3: AI water blowing bureau live broadcast

In the past year, I have done a total of 130 live broadcasts related to AI applications, actual combat, and hot spot interpretation for the study circle, accumulating 257 hours!

There are more than a dozen closed-door live broadcasts that can only be watched by members of the study circle, with an average viewing time of more than 1 hour, sometimes approaching 2 hours!

There is no dry goods, and the average stay time cannot reach this level.

Scan the code now to receive a 50 yuan instant discount

8 common problems and solutions for RAG system implementation

At the very beginning

1. Answer questions on your own

2. The answer is unstable

3. The answer is incomplete

4 Contains incorrect answers

5 No details/all details

6. There are no pictures

7. Slow response

8 Not systematic

Come to the AI learning circle to improve

Position 1: Knowledge Planet

Position 2: WeChat communication group

Position 3: AI water blowing bureau live broadcast

JD.com vs. Meituan, Cudi won

Several variables affecting JD.com’s takeaway appeared at the same time

Exceeded expectations! Taobao flash sale opened up nationwide in advance, and joined forces with Ele.me to reverse the takeaway war

JD.com VS Meituan: The final deduction of the “takeaway war”

Why is a Hello bicycle more expensive than a bus?

Xiaohongshu Entertainment live broadcast sprints urgently, appearing in the background in early May, and the voice hall may appear, are you ready?

o3 In-depth Interpretation: OpenAI Finally Uses Tool Use, Is Agent Products Dangerous?

The Truth Behind AI App Hits: From Cursor to Arc, PMF’s Key Insights That Determine Life and Death

In-depth Interview Practical Guide: Say goodbye to awkward chats and superficial information, and dig into user treasures

How does AI programming choose the right large model? 4 stages + 6 recommendations

When virtual life enters reality: a design innovation about “real”

Apple’s iOS system update: “Liquid glass” design language!

Imagine the sharing of solutions for the small universe platform to achieve user growth

Why has Xiaohongshu become the new Baidu, but Zhihu has not?

50,000 yuan can’t buy an interview with a big factory, and job search agencies harvest fresh graduates