Ng pointed out that MCP is in its early stages, and authentication is immature; Agent communication is still in its early stages, and most of the efforts are still being made to get a single agent up and running. He shared his judgment on the agent construction path, tool combination, etc., emphasized the importance of building a systematic evaluation mechanism, and discussed topics such as voice interaction and AI programming.
Professor Andrew Ng has a very deep understanding of agents, and the previous sharing and teaching have been very enlightening.
Recently, Andrew Ng and Harrison Chase, co-founder of LangChain, had another conversation to discuss the development status of Agent.
Link: https://www.youtube.com/watch?v=4pYzYmSdSH4&t=68s
InfoQ has compiled the conversation very well, so we have made some changes based on Andrew Ng’s comment on the current state of the agent: MCP is still in a “barbaric situation”, it is a “miracle” that a single agent runs through, and A2A collaboration can be called a “double miracle”.
In the past few years, AI tool companies have built a powerful and modular tool system. LangGraph, RAG and other components are like Lego bricks, allowing developers to flexibly assemble and quickly build systems.
But in real-world scenarios, it’s often stuck in a detail module, such as context management or evaluation. Experienced people can quickly change the solution for a few days, and inexperienced people may have to take a detour of several months.
This is also the “cruelty” of AI development – no single tool can dominate the world, the key is whether you are proficient in mastering and efficiently combining the entire toolchain.
On the other hand, the change between tools is also fast. For example, as the context length of LLMs continues to increase, many of the RAG best practices of a year and a half ago may not apply today. The emergence of MCPs fills another obvious market gap, making it easier to integrate between tools, APIs, and data sources.
However, as Ng said, MCP is still in the “wild stage” – there are many server-side implementations on the Internet, but “many of them can’t actually run”, and identity verification and token management are not yet mature. Moreover, in terms of agent-to-agent communication, Ng admitted that most people today (including himself) are still trying to get an agent to work properly. Successfully collaborating on two different agents is almost like accomplishing two miracles.
We translated this conversation to let you know Oh’s latest judgments and practical ideas on core issues such as the agent build path, the current state of MCP, and the ability to use tool portfolios.
01 The core of the architecture is task decomposition and process orchestration
Harrison Chase: Can you explain this idea by suggesting that we don’t worry about whether an application is an agent or not, but rather how “agentic” it is?
Andrew Ng: I made this point because I found that people were constantly arguing: “Is this an agent?” “This doesn’t count, right?” – Various definition disputes: Is it autonomous enough? Does it meet a certain standard?
What does a product manager need to do?
In the process of a product from scratch, it is not easy to do a good job in the role of product manager, in addition to the well-known writing requirements, writing requirements, writing requirements, there are many things to do. The product manager is not what you think, but will only ask you for trouble, make a request:
View details >
My feeling was that instead of spending so much time arguing about whether this is an agent or not, we as a community should think differently and think of “agenticness” as a spectrum – some systems are agentic, some are weak.
If you want to make an agentic system with a little bit of autonomy, or a very autonomous system, you can do that, so you don’t have to argue about whether this is an agent or not.
So I proposed, let’s call these systems “Agentic systems” and focus on how to build them. This way of thinking actually saves us a lot of argument time and allows us to enter the practical stage faster.
Harrison Chase: What do you think of this spectrum – from “low autonomy” to “high autonomy”? Where are most of you when building a system now?
Wu Enda: The actual work in many enterprises is to let employees fill in forms on the web page, search for information, check the database for compliance risks, and judge whether certain products can be sold to certain customers; Or copy and paste some data, do a search, and paste it into another form. These business processes are often very linear, and occasionally a small loop or branch usually means that the process fails, such as being rejected because a certain condition is not met. So, I see a lot of opportunities coming from these simple processes.
And I’ve noticed that companies still face great challenges when transforming existing processes into “agentic workflows”: For example, what granularity should you split the process to? What micro-steps should the task be divided into? When you build a prototype and it doesn’t work well enough, what steps do you want to improve to improve the overall effect? This ability to “disassemble executable microsteps from complex tasks”, design workflow structures, evaluation mechanisms, etc., is still relatively scarce.
Of course, more complex Agentic workflows are also valuable, especially those that contain a lot of loops. But in terms of “quantity”, the current opportunities are still mainly concentrated in these simpler linear processes, and everyone is systematizing and automating them step by step.
Harrison Chase: You’ve done a lot of deep learning teaching, and you’ve also done a lot of courses to help people understand and build agents. So what skills do you think are the most important for Agent builders?
Andrew Ng: I think the biggest challenge is that many business processes involve the specific operations of compliance, legal, human resources and other teams. So how do you build a “pipeline” to digitize these processes? Are you using LangGraph for integration? Or see if MCP can help with that too?
An important but often overlooked point is that building a proper eval system is not just about evaluating the effectiveness of the entire system, but also about tracking every step so you can quickly pinpoint “which step is broken” and “which prompt is not working”. Many teams may progress slower than they should in this process – they have been relying on manual evaluation, and every time they change the prompt, they look at the output one by one, and make manual judgments, which greatly affects efficiency.
I believe that the establishment of a systematic evaluation mechanism is the most ideal way. But the problem is that many teams don’t have this “what to do next” intuition yet. Underskilled teams often end up in a dead end—like spending months optimizing a component that will never do well. And the experienced team will say: “Let’s give up this plan and change the route.” ”
I hope I can summarize a more efficient way to teach everyone this “empirical judgment”, because many times you have to look at LangChain’s trace output, judge the current state, and then make decisions quickly within minutes or even hours, which is still very difficult.
02 From tools to systems: AI system construction has entered the era of modularity
Harrison Chase: Do you think this “empirical judgment” is more related to the limitations of LLMs (large models) themselves, or is it more inclined to the “building capabilities” of product structure and task disassembly?
Andrew Ng: I think it’s both. Over the past few years, AI tools companies have built a very powerful tooling system, including LangGraph. You can think about how to implement RAG, how to build a chatbot, how to make a memory system, build Eval, add Guardrails, and so on.
I often use an analogy: if you only have one color of Lego bricks in your hand, such as only purple, you will have a hard time putting together complex structures. But now we have more types of “blocks” tools: red, black, yellow, green, all shapes and functions. The more blocks you have, the more capable you will be to assemble them into complex structures.
The AI tools we mentioned are actually “Lego bricks” of different shapes. When building a system, you may need the “piece that bends the oddly shaped piece” and someone with experience knows which piece to use to get the job done quickly. But if you’ve never done any type of Eval, you may take an extra three months and take a detour. Experienced people will directly say, “Let’s use LLM as a judge, change the evaluation method to this, and it can be done in three days.” ”
This is also one of the more “brutal” aspects of AI – it’s not a tool that solves everything. When writing code, I also use a bunch of different tools myself. I can’t say I’m proficient in every one, but I’m familiar enough to put together quickly. Moreover, the change between tools is also fast. For example, as the context length of LLMs continues to increase, many of the RAG best practices of a year and a half ago may not apply today.
I remember Harrison started exploring this very early, like the early LangChain RAG framework, recursive summarization, etc. And now, thanks to the expanded context window, we can cram more information directly inside. RAG hasn’t disappeared, but it’s much less difficult to adjust parameters – there’s now a large range of “all okay” parameters.
So, as LLMs continue to evolve, some of our intuition from two years ago may no longer apply.
Harrison Chase: What are some LEGO components that are underrated right now that you would recommend paying attention to?
Andrew Ng: Although everyone is talking about evaluation now, many people don’t really do it. I don’t quite understand why not do it, probably because people generally think that writing an evaluation system is a huge and rigorous task.
It often happens that I build a system and then a certain problem keeps popping up. I thought it was fixed, but it broke again, and then it was repaired and broken again. At this point, I write a very simple evaluation, maybe with only five input samples, using some very basic LLMs as a review, and only test for this specific regression problem – such as “Is this place broken again?”
I don’t completely replace manual evaluation with automated evaluation, but I still see the output myself. But this simple assessment can help me take the burden off a little and run it automatically, so I don’t have to manually check it every time.
What happens then? Just like when we write papers, we start with a very rudimentary evaluation system that is obviously flawed. But when you have the first version, you think, “Actually, I can improve it,” and then you start iterating on it.
A lot of times I start with some terrible, barely helpful assessment. Then as you look at its output, you see “this evaluation system is broken, but I can fix it” and slowly make it better.
Another point I would like to mention, although there has been a lot of discussion, is the Voice stack, which I think is still far underestimated. This is an area of great interest to me, and many of my friends are also very optimistic about voice applications. We also see that many large enterprises are extremely interested in voice technology, and they are very large enterprises with large use cases.
Although there are some developers in this community who are doing voice, the attention of developers is still much less than that of these companies. And we’re not just talking about real-time speech APIs, nor just native audio models like Speech-to-text – because those models are often hard to control. I prefer a voice tech stack based on Agentic workflows, which is easier to control. I’ve been working with a lot of teams on voice stack-related projects lately, and some of them are hopeful to be announced soon.
Another thing that may not be “underrated”, but I think more companies should do it is to let developers use AI-assisted programming.
Many of you have probably seen that developers who use AI assistance are much more efficient than those who don’t. But I still see that many companies, especially CIOs and CTOs, have policies that do not allow engineers to use AI programming tools. I know sometimes maybe for good reasons, but I feel like we need to break through this limit as soon as possible. Frankly, my team and I can’t imagine writing code without AI help. But there are still many businesses that need to accept and adapt to this.
Another underrated view is that I feel that “everyone should learn a little bit of programming”.
An interesting fact about our AI Fund: everyone in our company writes code, including front desk reception, CFO, general counsel…… Everyone writes. It’s not that I want them to be software engineers, but in their own roles, they can tell computers more clearly what they want to do by learning a little bit of code. This has led to significant productivity gains in various non-engineering positions, which I also find quite exciting.
03 The key to voice interaction is the requirement of “delay”
Harrison Chase: If someone wants to get into voice now and they’re already familiar with building agents with LLMs, how mobile do you think their knowledge is? What are the connections? What are the new ones that need to be learned?
Andrew Ng: I think there are many scenes where voice is actually very critical, and it brings some new ways of interaction.
From an application perspective, text input is actually a “daunting” way of interacting. You go to the user and say, “Tell me what you think, this is an input box, write some text”, many people will feel very stressed. And text input can also be backspaced, and the user’s response speed is slower.
But the voice is different: time moves forward, you can say it, or you can change your mind temporarily, such as “I changed my mind and forgot what I said before”, the model actually handles these things well. So many times, voice can lower the threshold for users to use. When we say “Tell us what you think”, the user will naturally speak.
The biggest difference between a speech system and a text system is the requirement for “delay”. If a user speaks, the system ideally needs to respond within one second (ideally within 500 milliseconds, but at least no more than one second), but traditional Agentic workflows can take a few seconds or even longer. For example, we are working on a “My Avatar” project, where you can talk to your avatar on the webpage. Our original version had a delay of 5~9 seconds – you finished speaking, was silent for nine seconds, and then the clone answered, which was a very bad experience.
Later, we did some “pre-response” designs. For example, if you ask me a question, I might start by saying, “Well, that’s an interesting question” or “Let me think about it.” We asked the model to do a response like this to cover up the delay, which worked very well.
There are some other tips, for example, if you are a voice agent bot that plays a background sound while waiting instead of completely muting, users will be more receptive to the “sluggishness” of the system.
And in many applications, voice makes it easier for users to get into the state, lowering the threshold for “self-censorship”. When people speak, they do not pursue perfection as much as they write. They can talk casually, repeat, and express themselves freely – this makes it easier for us to get useful information from them and helps them move forward with their tasks.
04 If MCP was still early, then Agent communication was even earlier
Harrison Chase: What changes do you think MCP has brought to the way and type of applications are built? What do you think of its impact on the entire ecology?
Ng: I find MCP very exciting.
I personally like MCP very much, it fills a clear market gap, and OpenAI’s quick follow-up shows the importance of this standard. I think the MCP standard will continue to evolve in the future, and it mainly makes it easier for agents to access various data, but it is not just agents that can benefit a lot of other software.
When we use LLMs, especially when building applications, we often spend a lot of time building pipelines—that is, various data access tasks. Especially in large enterprise environments, AI models are actually very smart, as long as they give it the right context, it can do reasonable things.
But we often spend a lot of time dealing with access work and figuring out how to feed data to the model so that it can output what you want. MCP is a big standardizer in this area, making it easier to integrate between tools, APIs, and data sources.
Of course, MCP is still a bit “barbaric” right now. You can find many MCP servers online, but many of them don’t work. The authentication system is also very confusing, and even for some large companies, MCP services have issues such as whether the token is valid or expired.
Also, I feel that the MCP protocol itself is still in its early days. MCP now returns a long list of resources, and in the future we may need some kind of hierarchical discovery mechanism. For example, if you want to build a system – I don’t know if there will be an MCP interface for LangGraph in the future – but a system like LangGraph has hundreds or thousands of API calls, and you can’t cram them all into a flat list and let the agent filter through them yourself.
So we may need a hierarchical resource discovery mechanism. I think MCP is a great first step. I highly encourage you to learn about it, it may really make your development easier, especially if you can find a stable and easy-to-use MCP server implementation to help you with data integration.
I also think this is very important in the long run – if you have n models or agents to connect m data sources, you should not write the access logic separately for each combination, and the workload should not be n × m, but n + m. And I think MCP is a great first step in that direction. It still needs to continue to evolve, but it is a good place to start.
Harrison Chase: There is another protocol that’s not as hot as MCP, but it’s also worth paying attention to, and that’s agent-to-agent communication. So what do you think of the evolution of agent-to-agent communication?
Andrew Ng: Agent communication is still very early today. Most of us, myself included, struggle with getting our code to work. So getting my agent and another person’s agent to work together is like two miracles.
What I see so far is that a multi-agent system built by the team can be operated. Because everyone is on the same team, they know what agreements, agreements, and interfaces are, and they also know how to work together – so that they can run. But it’s too early for an agent built by one team to work with an agent from another completely different team. I believe we will achieve this eventually, but from what I have observed so far, I have not seen many cases of truly successful and large-scale operations. I wonder if you have similar observations?
Harrison Chase: That’s right, I agree with you. If MCP was still early, inter-agent communication was even earlier.
05 Using “vibe coding” programming exhausts me
Harrison Chase: What do you think of vibe coding? Is it a new skill compared to traditional programming? What role does it play in today’s world?
Andrew Ng: I think many of us now almost stop looking at code when programming, which is actually a great progress. However, I think the name “vibe coding” is quite unfortunate because it can mislead many people into thinking that this matter is just “by feeling” – for example, I accept this suggestion and I reject that, based on intuition alone.
But let’s be honest, when I spend a day working with this “vibe coding” method, which is with the help of an AI coding assistant, I usually feel very exhausted. This is actually an activity that requires a lot of intellectual investment. So I think that although the name is not good, this phenomenon is real, and it is developing, and it is a good thing.
Over the past year, some people have been advising others to “don’t learn to code” on the grounds that AI will automatically write code for you. I think looking back in the future, this will be one of the worst career advice in history. If you look back at the history of programming in the past few decades, every time the programming threshold is lowered, more people will start learning to code. Like moving from punched cards to keyboards and terminals, or from assembly languages to COBOL, I even found some very old articles where someone claimed, “With COBOL, we don’t need programmers anymore”.
But the truth is that every time programming becomes easier, more people learn to code.
So I think AI coding assistants will also push more people to learn to code. And, one of the most important skills for the future, for developers and non-developers alike, is to “tell the computer clearly and accurately what you want to do and let it do it for you.”
To do this, it’s helpful to know some of the basic working principles of computers. I know many of you here have understood this. But that’s why I always recommend learning at least one programming language, like Python.
Maybe some of you know that I am a person with more Python skills than JavaScript. But after using the AI programming assistant, I wrote more JavaScript and TypeScript code than ever before. Even when debugging JavaScript code that the AI generates for me instead of writing it myself, understanding the types of errors and what they mean is still very important to me to fix them.