Agent enters the engineering era! Ng explained in detail the whole process of AI Agent construction, the core is not the model, but the task dismantling and evaluation mechanism

At the LangChain Interrupt Summit, Andrew Wu elaborated on key contents such as the methodology and evaluation mechanism of AI Agent construction, emphasizing that the core lies in task disassembly and evaluation mechanisms. This article will provide an in-depth interpretation of its perspectives, exploring the key elements and development prospects of AI Agent construction.

At the latest LangChain Interrupt Summit, AI Fund founder Andrew Ng engaged in a conversation with LangChain co-founder Harrison Chase.

As an important promoter of AI education and business incubation, Ng systematically elaborated on the agent construction methodology, evaluation mechanism, voice and protocol infrastructure, and the intuitive judgment that developers and entrepreneurs should have.

He proposed that “agenticness” should be understood as a degree of continuity, not a label judgment; The core competitiveness of future AI builders is not in prompt design, but in process modeling and execution speed.

“agenticness” is a degree, not a label

Ng reflected on the scene where he shared the stage with Harrison more than a year ago. At that time, they were trying to convince the industry that agents were a field worth paying attention to. “At that time, people weren’t sure if agents were an important thing.” He said. Since then, as the concept of agent has gained popularity, the term “agenticness” has quickly become widely used by marketers, gradually becoming ambiguous. “The term was misused, and people began to use it to refer to various systems, but the actual meaning was no longer clear.”

He pointed out that at that time, many people were debating whether the system was an agent or not, and whether it was truly autonomous, but such debates themselves were not of much value. Instead of wasting time on these semantic questions, think differently. He proposed the concept of “agenticness is a spectrum”: different systems have different degrees of agenticness, from almost no autonomy to high degree of autonomy, which are reasonable existences, as long as the system has a certain degree of autonomy, it can be classified into the category of agentic system.

“If you want to build an agentic system with a little bit or a lot of autonomy, that’s reasonable. There’s no need to struggle with whether it’s really an agent. Andrew Ng said.

What does a product manager need to do?
In the process of a product from scratch, it is not easy to do a good job in the role of product manager, in addition to the well-known writing requirements, writing requirements, writing requirements, there are many things to do. The product manager is not what you think, but will only ask you for trouble, make a request:

View details >

This inclusive definition helps the entire community to free itself from semantic entanglements and promote practical implementation more efficiently. He said that this approach has indeed played a positive role, allowing more developers to take a break from the question of “whether it is an agent” and focus on whether the system can solve the actual problem.

Agent modeling experience is severely insufficient

When asked what stage of the “agenticness spectrum” builders are in, Andrew Ng said that his team uses LangGraph to handle more complex problems, such as multi-step process automation. “But I also see that many business processes in reality are actually linear, or linear with some failed branches.” He said.

He gave an example of how in some businesses, humans are still repeating a series of predictable operations, such as filling out forms, searching for information on the web, accessing databases to see if compliance is involved, and determining whether an item can be sold. This kind of process is actually a cycle of “copy-paste-search-re-paste”, and the structure is relatively fixed.

These processes are inherently ideal for agentization, but the biggest challenge is that many companies don’t yet know how to translate them into agentic systems. “For example, what kind of granularity should be used to split tasks? If the prototype doesn’t work well, where should I prioritize improvement? This kind of knowledge is actually very scarce in the industry. ”

Although there are more complex agentic workflows, such as multi-loop and multi-agent systems, Andrew Ng pointed out that the main problem facing builders at this stage is still focused on modeling and dismantling simple processes. “What we lack most now is actually the ‘intermediate skills’ to automate these structured processes.”

Agent systems need to be intuitive, fast, and practical

When it comes to the key skills needed to build an agent, Ng says the ability to build system pipelines is the first step. He pointed out that in real business processes, multiple roles are often involved: compliance, legal affairs, human resources, etc. Each role performs a specific task, and the agent system needs to simulate the logic of these roles to smooth the process.

So what should developers do? Is it LangGraph? Or MCP Host? Do you need to modularly integrate different subtasks? These all depend on the task itself. When many teams encounter system errors, they don’t know where the problem is or which part to optimize next.

“I find that many teams actually spend too much time relying on manual assessments. After each system adjustment, it is manually checked to see if the output is correct. Andrew Ng said. He believes that the lack of an evaluation mechanism is the biggest “invisible problem” in the current Agent construction process.

He advocates quickly building a “even bad” primary evaluation system, such as writing a detection script that only covers 5 input examples for a certain failure step, and using a simple model to determine whether the system is regressing. “It doesn’t need to completely replace the human eye, but takes on those repetitive judgment tasks.”

Ideally, he believes, developers can quickly make decisions based on tools like LangSmith in minutes to hours. This kind of “tactile intuition” based on real data and real failure paths is the most valuable experience in system construction. “Without that tactile sensation, you might spend months optimizing a component, but someone with experience knows at a glance that you can’t do it in that direction.”

Tools are building blocks, and cognitive coverage determines efficiency

Ng emphasized that there are now a large number of powerful tools in the AI community, but there is a huge gap in tool awareness among developers. He likened it to “colorful Lego bricks”: In the past, if you had only one type of brick, such as purple bricks, you could build very limited things. But now we have red, blue, green, LEGO of all shapes and sizes, and you can build almost any structure.

The existence of these Lego bricks, such as LangGraph, Retriever, RAG, Memory, Email Generator, Guardrail mechanism, etc., constitutes a technical library for building agentic systems. Developers who truly master these tools can quickly restructure the structure when the system fails, rather than getting stuck in verbose debugging.

“I also use a lot of tools when I write code. I don’t need to be an expert on every tool, but I know what they can do and what problems they solve. Andrew Ng said.

He added that RAG (Retrieval-Augmented Generation) best practices have also changed in the past year or two. The increased context window of large models means that many of the adjustments to hyperparameters in the past are now less urgent. Many old intuitions no longer apply, and developers must constantly update their “tool knowledge graph” or fall behind.

Voice stack with MCP protocol is underrated

When discussing which key areas are still being overlooked, Andrew Ng bluntly stated that the voice technology stack and MCP protocol are the most noteworthy directions. He believes that the value of voice applications is far from being developed.

“It is actually a very high threshold for users to write prompts. Long texts require organized language and repeated revisions, which can make people reluctant to speak. “But voice is the process of time moving forward, and users can continue to say it, even if they say it wrong, they can regret it, and the interaction process is more natural.

He said that in the virtual twin built in collaboration with Reald Avatar, the system response time was 5~9 seconds at first, and the user experience was very bad. Later, they added a “pre-response mechanism”, that is, the large model will first say buffer statements such as “let me think”, “this question is interesting”, etc., to fill the gap of these few seconds and greatly improve the experience.

They also found that adding a “call center background sound” to the voice system also eased the sense of waiting. This little trick, although simple, is a very important engineering methodology when building a speech system. “The running logic of the voice agent is completely different from the text agent.”

He also emphasized the value of the MCP protocol in future multi-model systems. Currently, enterprises often need to connect multiple data sources, APIs, and service interfaces when building agents, and if each pair requires a handwritten adapter, the maintenance cost is extremely high.

“MCP is a real attempt at interface standardization.” Ng said. He pointed out that the current implementation of the MCP server is still unstable, with many imperfect authentication mechanisms and inconsistent token management, but the overall direction is correct. In the future, MCPs should develop a hierarchical resource discovery mechanism that allows agents to discover call paths in a structured manner, rather than listing a bunch of tiled APIs.

He concluded that we are moving towards a world of “n agents docking m data sources”, and the existence of MCP has changed it from n×m maintenance costs to n+m interface management, which is a leap in computational complexity.

The winning hand depends on technical understanding and execution speed

At the end of the conversation, Andrew Ng talked about the work of the AI Fund. He said that the AI Fund does not make external investments, but co-founds companies. They value two things most when selecting partners:

The first is “technical understanding”. He said that many people now talk about the market, positioning, and go-to-market strategy, which are of course important, but they can be made up in a short period of time. The understanding of technology and the intuition of system construction are scarce abilities accumulated over a long period of time.

The second is “speed of execution”. Andrew Ng said he has seen some teams that have done in 2 weeks what others would take 3 months to complete. And this kind of speed is almost a watershed moment between success and failure. “Many teams have never seen ‘how fast a good team can be’.” He said.

He concluded by saying that the most important skill in the future, whether programmed or not, is “being able to express exactly what you want the computer to do.” “Even if you’re a CFO, legal advisor, front office, if you can write a little bit of Python, even if it’s basic, it can greatly improve your ability to work with AI,” he said. ”

Agent enters the engineering era! Ng explained in detail the whole process of AI Agent construction, the core is not the model, but the task dismantling and evaluation mechanism

“agenticness” is a degree, not a label

Agent modeling experience is severely insufficient

Agent systems need to be intuitive, fast, and practical

Tools are building blocks, and cognitive coverage determines efficiency

Voice stack with MCP protocol is underrated

The winning hand depends on technical understanding and execution speed

JD.com vs. Meituan, Cudi won

Several variables affecting JD.com’s takeaway appeared at the same time

Exceeded expectations! Taobao flash sale opened up nationwide in advance, and joined forces with Ele.me to reverse the takeaway war

JD.com VS Meituan: The final deduction of the “takeaway war”

Why is a Hello bicycle more expensive than a bus?

Xiaohongshu Entertainment live broadcast sprints urgently, appearing in the background in early May, and the voice hall may appear, are you ready?

o3 In-depth Interpretation: OpenAI Finally Uses Tool Use, Is Agent Products Dangerous?

The Truth Behind AI App Hits: From Cursor to Arc, PMF’s Key Insights That Determine Life and Death

In-depth Interview Practical Guide: Say goodbye to awkward chats and superficial information, and dig into user treasures

How does AI programming choose the right large model? 4 stages + 6 recommendations

Why is there a situation in model training where “machine scores are high, but human scores are poor”?

From PayPal to Adyen to Checkout.com, why has the British payment company become the new favorite of eBay?

ChatGPT vs DeepSeek: An in-depth comparison of the user experience of two products in the AI era

After Cursor, Devin, and Claude Code, another AI coding dark horse is rapidly emerging

Liang Ning’s “Real Demand” + Fu Sheng’s Insight: The Law of Product Survival under the AI Frenzy!