Realistic solution to the implementation of enterprise large models: Why is RAG an unavoidable technical path?

In the actual implementation process, the general large model faces many challenges in enterprise scenarios, such as data security, generative hallucinations, lack of business context and other problems, making it difficult to directly apply it to business. This article will delve into how the RAG (Retrieval-Augmented Generation) architecture can be a realistic solution for the implementation of large enterprise models.

In the past few years, large model technology has developed rapidly, from GPT-4 to Claude, Tongyi Qianwen, and Wen Xin Yiyan, and the industry has entered the era of “language model +”. It seems that everything can be a large model, and AI is also expected by enterprises to become the next generation of efficiency levers.

But after going deep into the enterprise landing scenario, we see a fact more and more clearly:

The general large model is still a long way from “business availability”.

In the past two years, we have cooperated with hundreds of large and medium-sized enterprises to explore the implementation path of large models, and found that a consensus has been continuously verified:RAG (Retrieval-Augmented Generation) is one of the most pragmatic, controllable, and most likely large model architecture solutions to be launched and maintained.

This article will revolve around the following questions:

What are the typical challenges of enterprise deployment of large models?
What “business problems” does RAG solve?
How to properly evaluate the effectiveness of a RAG system?
What pitfalls may enterprises encounter when implementing RAG?

01 When enterprises deploy large models, typical pain points are often similar

In a large number of landing projects, we found that the problems faced by enterprises in different industries when deploying large models are highly consistent:

1. Data security is difficult to guarantee

For many enterprises, data is their core asset, especially when it comes to internal documents such as compliance, finance, customers, products, and systems, and directly calling the cloud large model API is extremely risky. Industries such as finance, medical care, and government affairs are particularly sensitive to this.

2. Frequent hallucinations

Even GPT-4 may be “confidently” in verticals. This “hallucination” can be a nuisance in To C scenarios, but in enterprise scenarios, it can directly lead to poor decisions, customer complaints, and even compliance risks.

3. Generic models lack business context

Each enterprise has its own terminology, organizational structure, and business processes, and it is often difficult for general models to give accurate and practical answers without understanding the context, and it is easy to “answer the wrong question”.

Despite these challenges, enterprises are still very enthusiastic about the implementation of large model technology – they hope to safely utilize the capabilities of large models at a reasonable cost, while ensuring high accuracy and reliability, and being able to flexibly adapt to business scenarios.

02 RAG: The realistic path of large models to “check information and then answer”

The core idea of RAG is simple:Retrieve relevant knowledge first, and then generate answers based on the search content。 The process is roughly as follows:

User questions → converted to vectors → Retrieved knowledge base → Combined questions + retrieved content → Input large models → output answers

Compared with traditional large model calls, RAG has the following advantages:

B-end product manager’s ability model and learning improvement
The first challenge faced by B-end product managers is how to correctly analyze and diagnose business problems. This is also the most difficult part, product design knowledge is basically not helpful for this part of the work, if you want to do a good job in business analysis and diagnosis, you must have a solid …

View details >

1. Improve accuracy and reduce the risk of hallucinations

Provide factual support through knowledge base retrieval, greatly reducing the space for model “free play”. In practice, the accuracy rate can often be improved from about 70% to more than 90%.

2. Data does not leave the local area, which is more secure and controllable

Knowledge bases, corpora, and large models can all be deployed locally to meet the compliance requirements of high-security industries.

3. Reduce training costs and be more flexible

No need for large-scale fine-tuning, just manage the content of the knowledge base, that is, continue to optimize the quality of answers.

4. Support instant knowledge updates

Once the document is updated, the Q&A system syncs instantly, eliminating the need to retrain or iterate on the model.

03 Enterprise RAG implementation practice: two real cases

Case 1: Bank customer service automation

A national bank deposits business documents, process manuals, FAQs, etc. into the knowledge base and automatically answers customer questions with the RAG system. Through intent recognition optimization and prompt word engineering, the answer accuracy has been improved from 60% to 96%. At the same time, the labor burden was reduced by 40%, and the waiting time of users was reduced by nearly half.

Case 2: Manufacturing technical document query

Technicians of an industrial equipment enterprise need to frequently consult hundreds of thousands of pages of engineering documents. We built an intelligent Q&A platform based on RAG and developed a segmentation strategy and a dedicated extraction algorithm for technical documentation. After the system was launched, the average search time was shortened by 60%, and the learning cycle of new employees was reduced by about 30%.

Together, these two cases show:RAG is not only usable, but also brings practical efficiency improvements, and is a large model landing path with clear ROI.

04 Accuracy is a core indicator of RAG’s ability to enter the “production environment”

A key threshold for whether an enterprise RAG system can be launched is whether the accuracy rate can be achieved More than 95%。 Accuracy is not a single indicator, but consists of the following three parts:

1. Recall + Precision

Whether the system can find the most relevant content from the knowledge base is a prerequisite for generating the correct answer. A system with high retrieval accuracy should be able to accurately identify the most relevant pieces of information from the knowledge base that are most relevant to the user’s problem. This is usually measured by recall and precision. Recall reflects how much relevant information the system is able to find, while precision rate reflects how much of the information found is truly relevant.

2. Generate accuracy

Whether the large model can accurately understand the problem, justify the argument, and clearly express the answer based on the search content is a measure of the system’s ability to generate correct answers based on the search results.

3. Ability to refuse

When faced with unanswerable questions, the system should clearly say “I don’t know” rather than “open your eyes and tell lies”.

In practice, it is recommended to putMore than 95% comprehensive accuracySet the threshold for RAG systems to go online – this is necessary for systems that need to operate stably in a production environment for a long time, otherwise the system is prone to user doubts and business conflicts.

05 Why is it easy to use during testing, but after launching, it “slipped off a cliff”?

A common phenomenon is:Testing is excellent, but errors are frequent in the real business environment。 We summarize the following reasons:

User expressions are far more complex than test sets: Contains typos, abbreviations, colloquialisms, context jumps, etc., and the diversity of expression far exceeds the coverage of the test set.
The problem is widely distributed and unpredictable: The user’s questions often exceed the preset range;
High concurrency pressure, performance and accuracy are mutually restrained: When the system load increases, the accuracy may be affected if there is no good architecture design and optimization;
The production environment has a very low fault tolerance rateEven with 90% accuracy, in production it can mean that 1 in 10 users get an error message, which can be fatal in real business.

Therefore, the construction of a “strong robustness” system architecture and a “full-process evaluation mechanism” is a necessary supplement for enterprises to deploy RAG systems.

06 Accuracy Challenge of RAG System

Four key elements that determine the effectiveness of a RAG system

1. Corpus quality and update mechanism

Corpus data is the foundation of the RAG system, junk data, redundant content, and outdated documents will directly affect the accuracy of answers, and the construction and maintenance of corpus database are the key to the success of the RAG system.

2. Accuracy of the retrieval algorithm

Vector retrieval quality, recall strategy, paragraph segmentation strategy, etc. are the first threshold of RAG system, and the optimization of each link will directly affect the overall accuracy of the system.

3. Understanding and reasoning ability of large models

Different models have obvious differences in their understanding of professional content, so it is crucial to choose a model suitable for the business and adjust appropriate parameters to improve accuracy.

4. Prompt Engineering

Constructing effective prompts to guide large models to correctly understand and utilize retrieval information is one of the core capabilities in practice.

Common accuracy pitfalls and misunderstandings

First, it isOver-reliance on test set accuracy。 As mentioned earlier, there are fundamental differences between test environments and production environments, and relying solely on high accuracy on test sets does not guarantee the performance of the system in real-world applications.

Second, it isNeglect the cultivation of refusal ability。 Many teams focus too much on the questions the system can answer and neglect the ability to “know they don’t know”, which can lead to serious misdirection in production.

ThirdIgnore performance metrics。 While pursuing high accuracy, the response speed and resource consumption of the system also need to be kept within a reasonable range. Excessive pursuit of accuracy can lead to slow or costly systems.

07 Write at the end: RAG is a realistic solution, not a final answer

RAG is not perfect, but it is the most realistic and secure path for large models to enter business scenarios. Its value lies not in showing off skills, but in the following:

It connects the stability of the retrieval system with the expression ability of large models;
It respects the security requirements of enterprises for data and also empowers the intelligence of business processes.
It is not a single point of technology breakthrough, but the optimization and synergy of the overall architecture.

If you are planning an enterprise-level AI Q&A system, it is recommended to plan a technical path that can be implemented, evaluated, and iterated from the perspective of the “RAG system” and combined with your own business and corpus structure.

Realistic solution to the implementation of enterprise large models: Why is RAG an unavoidable technical path?

01 When enterprises deploy large models, typical pain points are often similar

02 RAG: The realistic path of large models to “check information and then answer”

03 Enterprise RAG implementation practice: two real cases

04 Accuracy is a core indicator of RAG’s ability to enter the “production environment”

05 Why is it easy to use during testing, but after launching, it “slipped off a cliff”?

06 Accuracy Challenge of RAG System

07 Write at the end: RAG is a realistic solution, not a final answer

JD.com vs. Meituan, Cudi won

Several variables affecting JD.com’s takeaway appeared at the same time

Exceeded expectations! Taobao flash sale opened up nationwide in advance, and joined forces with Ele.me to reverse the takeaway war

JD.com VS Meituan: The final deduction of the “takeaway war”

Why is a Hello bicycle more expensive than a bus?

Xiaohongshu Entertainment live broadcast sprints urgently, appearing in the background in early May, and the voice hall may appear, are you ready?

o3 In-depth Interpretation: OpenAI Finally Uses Tool Use, Is Agent Products Dangerous?

The Truth Behind AI App Hits: From Cursor to Arc, PMF’s Key Insights That Determine Life and Death

In-depth Interview Practical Guide: Say goodbye to awkward chats and superficial information, and dig into user treasures

How does AI programming choose the right large model? 4 stages + 6 recommendations

“The Lord of the Rings: Tales of the Shire” media score lifted IGN4 bad reviews!

Able to think independently, be self-driven, and empathetic, this kind of person will succeed in everything he does

Domestic large model “top five”, decisive battle with AGI!

Why can’t great AI products be planned?

From “lying in favorites and eating ashes” to “revitalizing knowledge and using it” – AI knowledge base 3.0 upgrade scheme design