2025 is known as the “first year of AI Agent”, and products such as Manus and Fellou have been launched, setting off a new round of intelligent boom. But when these agents are put into real tasks, they frequently “overturn”: inability to understand complex contexts, chaotic collaboration, logical jumps…
AI Agent wants to move from “concept” to “capability”, what is missing? This paper proposes that a truly usable agent system must have three underlying capabilities: stable and consistent contextual understanding mechanism (MCP), standardized agent collaboration mechanism (A2A), and powerful reasoning model capabilities.
The three correspond to each other: memory mechanism, collaboration mechanism, and decision-making mechanism, which constitute the technical basis for multi-agent collaborative work in the future. The current product is still in the early stages of exploration, but the trend is already emerging: AI products are being reshaped.
2025, known as the first year of AI Agent, has almost become an industry consensus.
In the first half of the year alone, several companies have successively released AI Agent products in different forms. Web products range from Manus to Coze Space, AI browser products range from Dia to Fellou, etc., and various agent products claim to be able to perform responsible tasks, each of which is close to the “future should be” in concept – understanding instructions, perceiving the environment, making autonomous decisions, planning and execution, and ultimately achieving goals.
Over the past few months, I’ve also tried to use these agents for practical tasks, such as generating a high-quality prompt engineering learning report. This type of task may seem simple, but it involves a large amount of content collection and summarization, which puts forward higher requirements for the stability, context understanding, and reasoning ability of the agent.
However, the reality is not ideal. Like what
- Coze Space (Beta) and Qwen 3 (analysis and research), although there are sub-task division and integration mechanisms, either the execution chain is broken or they fall into the illusion of “task fake completion”, and the content quality is uneven.
- Fellou (v1.3.3 (400)) often interrupts tasks, gets stuck in the middle of execution, loses context after multiple rounds of conversation, and has not completed a single task so far.
Although I haven’t actually used Manus and the AI browser Dia, I have a preliminary understanding of the advantages and disadvantages of the product by referring to the evaluation content of Phineas and Super Huang (see the link at the end of the article for details).
So, why do AI Agent products get “stuck” and fail tasks, or the quality of the output is uneven?
B-end product manager’s ability model and learning improvement
The first challenge faced by B-end product managers is how to correctly analyze and diagnose business problems. This is also the most difficult part, product design knowledge is basically not helpful for this part of the work, if you want to do a good job in business analysis and diagnosis, you must have a solid …
View details >
On the surface, these problems are only tool call failures or poor model planning and summary capabilities, but in-depth analysis reflects the defects of the underlying design mechanism, and are not single-dimensional problems, but the result of multiple factors.
Pushing the product backwards from the perspective of technical implementation makes me realize a fact:At this stage, even if there are good engineering capabilities and excellent models behind it, it is difficult to effectively complete complex tasks.
At present, building an efficient and usable agent requires three prerequisites:
- Stable and consistent context understanding mechanism (MCP, Model Context Protocol)
- Standardized Agent Collaboration Mechanism (A2A, Agent-to-Agent Protocol)
- More powerful model capabilities that are closer to human reasoning logic (Reasoning Model, such as Claude 3.7 Sonnet, GPT-o3)
If today’s Agent ecology is compared to Zhang Wuji in the Xuanming Divine Palm, then MCP is the Nine Yang Divine Skill (internal strength mind), A2A is the Great Movement of Heaven and Earth (move routine), and the Reasoning Model is tenacious will and strong body.
In this article, I would like to try to talk about it based on my observations and experiences during this time.
- Why are MCP, A2A, and Reasoning Model an integral triangle in the evolution of Agent products?
- Why is it that only by converging these three can agents evolve from “occasional play-by-use products” to “productivity tools that can solve more complex tasks”?
1. Why MCP, A2A, Reasoning Model are premises
MCPs are responsible for unifying and structuring contextual information while sharing tools and resources, A2A builds communication protocols and behaviors for multi-agent collaboration, and Reasoning Model provides task planning and decision reasoning. The three complement each other and are indispensable.
1.1 MCP: Model Context Protocol
Wikipedia:MCP (Model Context Protocol) is a new connection standard (communication protocol) first proposed to the public in an article published by Anthropic on November 25, 2024. The protocol aims to unify the connection between front-end AI assistants and back-end data systems, help front-end models complete higher-quality work, provide standardized external data interaction interfaces for large language models (LLMs), and overcome the limitations of models relying solely on training data.
Most of the agent products we see today are still in the stage of “single point intelligence”: each agent is like a “closed black box”, they understand the “context” and “reason” each other, and they lack a unified perspective and do not understand and collaborate with each other.
Just like the mobile Internet has established one data island after another, dispersing users’ data across different platforms. User data has users themselves as the controllers of information and have a global view, but these agents do not.
Multiple agents are online at the same time ≠ multiple agents collaborate, they may share user input, but they don’t know each other’s state, goals, and context. There is no unified semantics and a behavioral coordination mechanism, and in the end, they are just doing their own thing.
This is one of the problems in the Agent ecosystem today:Contextual information fragmentation。
MCP (Model Context Protocol) is a standardized, structured context sharing mechanism. It allows all models and agents to understand goals, formulate actions, and evaluate feedback based on the same world state when executing tasks or calling tools.
Specifically, MCP addresses three main layers of problems:
- Input alignment: The information received by all models is consistent and traceable, and there is no longer a blind guess on each other.
- State synchronization: During the execution of the task, state changes are recorded and broadcast in a standard format to avoid information drift.
- Reasoning consistency: The intermediate inference results of different models can be read and reused with each other to form a coherent inference chain.
Here is an example of the screenshot content of Mr. Dou’s official account (see the picture below), which may be relatively intuitive.
1. Type “I want to drive from Guangzhou to Wuhan to see cherry blossoms, please help me plan the route and save the itinerary plan to my notes.” ”
2. Claude client receives instructions,
1) Call MCP Server: Amap – Maps
-Invoke the maps_direction_driving function of Amap – Maps to plan driving routes
– Call the maps_weather function of Amap – Maps to query the weather conditions in Wuhan
-Call the maps_text_search function of Amap – Maps to query the cherry blossom viewing location in Wuhan
-Call the maps_search_detail function of AMAP – MAPS to get detailed information about Wuda Cherry Blossom Garden
2) Call MCP Server: mcp-server-flomo (flomo floating ink note)
– Invoke Flomo’s write_note function to save trip plans to notes
3. Claude 3.7 Sonnet integrates information to feedback to users on itinerary plans including driving route details, Wuhan weather, cherry blossom viewing location recommendations, etc., and informs users to view the full itinerary in flomo notes.
You can also put MCP The analogy is “extended version of multi-turn conversations”fromContextual management of conversations within a single model, becameContext sharing and information synchronization in multi-model collaboration scenarios。
Originally: A single model remembers “what the user has said in recent conversations” – > multiple rounds of conversation;
Right now: Multiple models know not only “what the user has said in recent rounds of conversations” but also “where this task is now and what other models have done” – > MCP;
This means that the model used to remember what you are talking to it on its own, but now it has to synchronize with other models to “where this is done” and “what tools and resources” are available to support the call.
1.2 A2A: Agent-to-Agent Protocol
Wikipedia:The A2A protocol was initiated by Google and is an open standard. The protocol is built on widely accepted technical standards such as HTTP, SSE, JSON-RPC for easy integration with the organization’s existing IT stack. In April 2025, Google officially announced the agreement at Cloud Next 2025, which was supported by more than 50 industry leaders.
When I first orchestrated with the Coze process, I was actually a little excited. In my imagination, AI will finally transform from a personal assistant to a team, theoretically, everyone performs their own duties and advances in parallel, as long as one task is mentioned, the rest can be done automatically.
But the reality is not like that. I found that when multiple agents collaborate, it is much like a stand-alone machine without a network, what you tell A, you also need to synchronize Copy to B, B doesn’t know what A is doing, and when the task is executed to C, it doesn’t know what you and A and B have said…
What’s more, multi-agent collaboration is not simply a few AIs that can talk to each other and call each other. The real question is not “whether you can say it”, but in itWhether the speech is standardized, whether there is structure, and whether it can be acted and coordinated after speaking.This is exactly what A2A (Agent-to-Agent Protocol) wants to solve.
So, what is A2A?
If MCP is a protocol that ensures that a model or an agent can continuously understand the context, then A2A is a mechanism that allows multiple agents to have a “common language”. It not only solves the problem of “what to say, what to say”, but more importantly:
- Who (Agent) should work and when?
- What did the previous Agent do, and is the output compliant?
- Can I request help from other agents or roll back if I encounter a problem?
A2A standardizes all of these, i.e. Agent Card, Server, Client, Task, etc.
1.A2A Agent Card
-JSON format, which explains what agents are, what they can do, and how to interact with them.
-Discover agents who are more suitable for work through the Agent Card.
2.A2A Server
-A bot running on the Internet, responsible for listening to incoming requests and completing work.
– Synchronize the results and ask follow-up questions.
3.A2A Client
-Any program, or another agent.
– Package your needs into tasks, send them out, and accept answers.
-The bridge between the user, system, and agent, and there is no need to write separate code for each agent.
4. A2A Task
-Task is a single to-do item
– Know all the status of the task, submitted, in progress, and completed.
If you use the scenario of “basketball team playing a game”, you can explain the different roles and roles of MCP and A2A very intuitively.
MCP is a tactical book, tactical board + player memory system:MCP is like a “playbook” and “game context” that is constantly updated in the minds of every player on the team.
- Players know what the current score is (mission status)
- Who passed the ball and who shot the basket just now (Dialogue History)
- Remember the tactics assigned by the coach (system instructions + medium-term goals)
- Know if it’s a fast break, a defensive counterattack, or a half-court positional battle (mode switching)
Other wordsMCP guarantees that “every player knows what’s going on and what to do (like shooting or passing)” and doesn’t feel like amnesia and not knowing what was done ahead of them just after a round.
A2A is a passing, defense, and tactical cooperation mechanism between players:A2A is a set of “collaborative agreements” between players on how to cooperate, how to pass the ball, and how to execute tactical actions.
- When player A initiates the pick-and-roll, player B knows it’s time to cut (mission triggered)
- If there is a mistake, everyone knows how to deal with it (error recovery)
- When switching tactics, there are clear passwords and passwords (mission switching mechanisms)
- Each role has clear responsibilities: point guard organization, shooting guard running, interior pick-and-roll (division of roles)
Other wordsA2A ensures “smooth cooperation, information sharing, and organized cooperation between players”, not individual games.
What happens if only MCP does not have A2A?Just like every player knows the current score and tactics, but they don’t pass the ball or communicate with each other, and no one fills in the face of changes – in the end, it’s still scattered.
What happens if only A2A has no MCP?The players are willing to pass the ball to each other, but no one knows what quarter it is, how much the score is, whether they are behind, whether they need to shoot three-pointers or control the tempo – losing context, cooperation will be different.
So, the ideal state is: playersClear whole site context (MCP), tacit understanding and efficiency (A2A), in order to play a smooth, coherent and strategic game. This is exactly what the future Agent ecosystem would look like: not a bunch of smart brains pieced together, but a truly “teamwork” system.
1.3 Reasoning Model: Take Claude 3.7 Sonnet and GPT-o3 as examples
Wikipedia:Reasoning language models are artificial intelligence systems that combine natural language processing with structured reasoning capabilities. These models are usually constructed by prompting, supervised finetuning (SFT), and reinforcement learning (RL) initialized with pretrained language models.
An inferential language model is an artificial intelligence system that combines natural language processing with structured reasoning capabilities. These models are typically built through prompts, supervised fine-tuning (SFT), and reinforcement learning (RL) initialized by pre-trained language models.
When it comes to reasoning models, it can be traced back to OpenAI’s “12-Day Christmas Release” in late December last year (2024), GPT-o1 on the first day to GPT-o3-Preview on the last day, as well as o3-mini and o3, o4-mini later in the year, and later “well-known” DeepSeek-R1.
The essence of inference models is to allow the model to build CoT independently and choose whether to display the inference steps or inference processes, that is, the form of the inference process can be implicit internal calculation or explicit step generation, such as GPT-o3 and DeepSeek-R1.
In the past, the AI field has been constantly stepping on pitfalls and trial and error: wanting to artificially add CoT to the model, trying to make AI learn to “think like humans”. OpenAI research scientist and core contributor to o1 Hyung Won Chung In the MIT share, it was mentioned that “Don’t teach, incentivize” (don’t teach, incentivize). In other words, do not “teach” the model, but “inspire” it to explore independently.
Later, it was found that during the training process, process incentives did not need to be done, and the results were directly rewarded, allowing the model to play freely, and then the inference model was obtained.
Looking at Anthropic’s content (https://www.anthropic.com/claude/sonnet), you will find that “reasoning ability, tool calling, and multimodality” have always been the direction they are exploring.
Computer use
By integrating Claude via API, developers can direct Claude to use computers the way people do—by looking at a screen, moving a cursor, clicking buttons, and typing text. Claude 3.5 Sonnet was the first frontier AI model to be able to use computers in this way. Claude 3.7 is our most accurate model to reliably use computers in this way—albeit experimentally in public beta—and we expect the capability to improve over time.
By integrating Claude through APIs, developers can instruct Claude to use computers like people – by looking at the screen, moving the cursor, clicking buttons, and typing text. Claude 3.5 Sonnet is the first cutting-edge AI model capable of using computers in this way. Claude 3.7 is the most accurate model we’ve reliably used a computer in this way (albeit experimental during the public beta phase), and we expect the feature to improve over time.
Robotic process automation
Automate repetitive tasks or processes with Claude 3.7 Sonnet. It offers industry-leading instruction following and is capable of handling complex processes and operations.
Automate repetitive tasks or processes with Claude 3.7 Sonnet. It offers industry-leading guidance tracking and the ability to handle complex processes and operations.
No matter what agent product it is, it is inseparableModel reasoning capabilities, tool calls, and multimodality。
Two days ago, I watched a video and saw a comparison between Agent and Reasoning Agent. Among them, it is also important to be able to reason about the use of tools.
In fact, the first inspiration for me was Cursor, and of course, the essence was the Claude model itself. I found that when it solves a problem for me, it will constantly “reflect”, and when solving a problem, if solution A does not work, it will change to the next plan B, and then plan C and plan D, constantly reasoning about new solutions until the problem is solved.
Initially, I thought all inference models had this ability, but then I realized:No, most inference models can only reason about text content, but Claude can reason not only about text content, but even about tool usage。
This is very similar to the process of using tools: if there is a problem that cannot be solved with a small wrench, change to a large wrench, or even switch to another tool.
Why is the Reasoning Model an infrastructure?
When AI is needed to perform a task: “Help me summarize conclusions from the conversation and format it into structured JSON.” This is actually a multi-step, conditional judgment process that includes abstract transformation. It’s not just information extraction, it’s more like a “human-like thinking”:
- What information is important?
- Are there any inconsistencies in the content?
- Should information be grouped? How to organize in groups?
- The user didn’t say it, but does the content in the search results need to be completed reasonably?
- When outputting content, consider whether it can be further processed?
In the past, we relied on Prompt Engineering to solve it layer by layer, or through nested functions/plugins, but these methods only solved the “interface calls” and did not solve the quality of the “inference chain” itself, and made the prompt super complex. It’s like using a bunch of robotic arms, but without a brain to command.
That’s when you need a model with real reasoning capabilities to orchestrate complex logic, handle fuzzy inputs, and produce executable structured results.
What role is Reasoning Model?
If MCP is to make each player remember tactics, A2A is to have a coordination mechanism between them, then the Reasoning Model is likeinstructorIt not only knows the situation (contextual information), but also can dispatch players (Agent), and will also respond on the spot, and even play for the players at special moments (direct execution).
The reason why many products appear to have multiple agent interactions is actually that multiple assistants are queuing up to handle tasks. Without the Reasoning Model, these assistants will never form tactical linkages, let alone strategy jumps or intent guidance.
The emergence of inference models has made “commanding AI” possible.The biggest changes that Claude 3.7 and GPT-o3 have brought me are:I have come to believe that AI in the future will no longer be a “hand that does concrete things” but a “brain that can organize complex tasks.”
It is no longer necessary for you to tell it how to do it, but you tell it what you want, and it is more appropriate to decide what should be done.
From MCP to A2A to Reasoning Model, the relationship between the three is like:
- MCP: Who am I, where am I now, and what I just did.
- A2A: How to communicate and collaborate with each other.
- Reasoning Model: Where do we go now, what should we do?
This is the key to truly enabling multi-agent collaboration to move from “being able to run through processes” to “completing high-level tasks”.
2. The current Agent product is still in its infancy
Today’s Agent products can receive commands, but they can’t really and efficiently complete the user’s tasks.
According to the product form, the current mainstream agent products can be roughly divided into the following types:
- Web products, such as Manus, Coze Space, and Wenxiaobai Research Report.
- Browser products like Dia, Fellou, etc.
- Client products like ChatGPT, Claude, etc.
- Browser plug-in products such as Monica, Doubao, etc.
- IDE plugin products, such as Trae Plugin (formerly MarsCode), GitHub Copilot, etc.
- AI IDE products like Cursor, Trae, Windsurf, etc.
2.1 Web products, taking Coze Space as an example
On the evening of April 20th, I squatted in the Coze group and got an invitation code for Coze Space, and I was lucky.
Then, I chose “Planning Mode” (Coze Space has Explore Mode and Planning Mode) to test Coze Space’s capabilities. The prompt is as follows:
Now I want to learn all the key techniques of prompt engineering, including but not limited to zero-shot prompting, few-shot prompting, in-context learning, chain of thought, etc.
Add more context.
1. My current knowledge of prompt engineering: Understanding basic concepts but lacking practice.
2. What areas do I plan to apply prompt engineering techniques in: code generation and assisted programming, personal assistants, and productivity improvement
3. Learn the focus of the documentation: tips and best practices, detailed examples and templates
4. Difficulty level of the document: Intermediate – balance theory and practice
Please generate a high-quality learning document for me based on the above content. It is best to generate an HTML file that supports opening and previewing in a local browser.
At the time, it was possible that there were too many registrations, causing the server to crash, and after encountering multiple failures, I finally gave up.
Task playback address: https://space.coze.cn/s/DXXe317vmqI/
Of course, the mission was also successful afterwards. Although he didn’t come as required, he was able to complete the task anyway.
1) Task playback address (Intermediate Learning Prompt Engineering Technology / Prompt Engineering Study Guide)
-Coze Space: https://space.coze.cn/s/j3n6qHgvzPw/
-Manus: https://gboammej.manus.space/
2) Other tasks for others (do memory concept PPT for undergraduates): https://space.coze.cn/share/7495661743659302964
2.2 AI browser, take Fellou as an example
The Fellou invitation code was obtained by me in a group chat with Wei Shijie, the host of a podcast “Wei Shijie|Business Talk”. The reason was that I listened to her interview with Fellou founder Xie Yang (see the end of the article for the address), and then they prepared some invitation codes for the audience as a benefit. The content in this issue is of high quality and recommended. From this episode, I also learned a lot about Xie Yang’s own thoughts on designing products:
- Source of product demand: seek inward, and demand starts from oneself. (Resonance +1, I started the SafeMark project at the beginning of the year from my own needs)
- Tesla FSD gives a lot of inspiration. Analog browser to Tesla in a computer, the user’s continuous use is equivalent to data annotation.
- Manus and others use a cloud virtual machine solution and can only browse public websites. Different platforms have different restrictions (such as prohibiting automated operations) and also involve user account privacy issues.
- Why Deep Search, not Deep Research? Because I am not satisfied with the cited part, I want to know more context, and I will read every piece of content.
- Why do you have to make your own GUI model, etc.
Back to the product, I felt that Fellou was not very productive in my overall experience.
After I give it a requirement and confirm the requirement, I see that during the run,
- Open the page according to the task;
- Screenshot the current page to the model;
- Mark the entire page element (as shown below), then send it to the model, and control the cursor to execute
Here, I understand that two screenshots should be sent to the model for processing, the former to facilitate the model’s “understanding” (OCR or other solutions) content, and the latter to mark the content of the page elements, so that Computer use can be called to control the cursor to perform operations according to requirements.
In particular, it is a wonderful feeling to see the cursor move around the page with the task. (Even if this process is very slow, especially if the first task is heavy, it is often in a continuous cycle: “Screenshot to the model – Secondary screenshot to the model – Perform the operation”). Perhaps it is precisely because the current process is too slow that Xie Yang thought about making his own GUI model.
So, why are major manufacturers in such a hurry to launch such “immature products”? (Regardless of business and other considerations, I will only try to find the answer to this part from the perspective of the product)
- If there is an information error in the output result, it is not mentioned first, because this problem can be solved by engineering optimization or other solutions.
- The structure of the message or the structure of generating replies is fixed, resulting in the final result as if the agent is tied up and becomes task filling (some of the content under the heading is “none”).
- User data is important, and when AI is unclear about user preferences, its result is to fill in the information according to the “task framework” given by the product manager, which will most likely lead to results that do not meet expectations.
First, “immaturity” is my subjective assessment as a user because I think it doesn’t give me results that meet my expectations (including incorrect information, misunderstanding of my problems, etc.). Secondly, because of the different roles, as a user, I certainly hope that the more mature the product is, the better, and the more efficiently it can solve my problems, the better. But in order to better achieve the latter, you need to “collaborate” with it to let it “understand” your habits and preferences more.
To this day, I still like to use ChatGPT to solve some problems or work scenarios because it knows more about me, not only the information of my conversations with it for more than two years, but even the content of my flomo notes.
Even if we know that AI cannot have the ability to “understand the user”, the person who really gives it the ability to “have” this ability is the product designer behind it. Because the only thing AI is good at and has is to predict the next word, or the next word or more based on the current word. But this result is enough for users.
Users only need to feel that this AI “understands me” and can efficiently solve my current problems.
However, the more you want to achieve this result or goal, the sooner you need to involve users in using the product, continuously run in the product through real interaction, promote the operation of the data flywheel, and then optimize the product experience, build a moat, establish the user’s mind, and gradually increase the replacement cost of the product.
3. Summary and outlook
Yang Yang wrote so much. In fact, what I originally wanted to write was a horizontal comparison between Manus and Coze Space, and even the title was thought of, so I called it “Manus’s Cake, Given by Coze Space”.
But when I read a lot of horizontal comparisons, I finally gave up. Homogenization of content is very important, and on the other hand, I see something deeper, that is, the violent argument that this content wants to express:MCP + A2A + Reasoning Model = Future。
I believeSingle-point intelligence cannot form swarm intelligence.
In the past, I always thought that in the design logic of agent products, multiple agents were online at the same time, which was collaboration. In fact, they only “exist at the same time”, but they do not understand each other, lack semantic information sharing and behavior coordination mechanisms, and are doing their own things.
I also thought for a while that being able to access a very good reasoning model could solve many problems. Yes, it can solve some of the work, but even if you choose the strongest model access, as long as it doesn’t remember the context and doesn’t have tools to work with it, it’s just a super expensive token consumer.
Here’s why MCP and A2A are prerequisites:Remember task contexts, synchronize resources, and establish communication protocols, and the Reasoning Model is the core of the finale:Make strategic decisions in dynamic context, bridge fragmented tasks, and schedule appropriate agent execution.
I think the way AI products are built is changing in the future.
In the past, when making an AI product, you needed to consider System Prompt (including Few-shot, In-Context, CoT), how to call RAG efficiently, which tools to call, how to optimize the process, etc., perhaps the core of designing AI products in the future will become:
- whether it is built clearly Model context protocol(MCP)?
- Agents Concrete communication mechanism(A2A)?
- Is there one?Powerful models with reasoning capabilitiesTo make decisions, schedule agents, and even modify the entire process?
From this perspective,We are moving from the era of Prompt Engineering to the composite system engineering stage of Protocol + Planning + Reasoning Model。
Of course, the restructuring of AI products also means the restructuring of the role of AI product managers.
A subtle change behind this is that product managers need to think and will also go through the stages of “interface design→ process design→ service design→ multi-agent collaboration mechanism design”.
Whether it’s web products Manus, Coze Space, AI browsers Dia, Fellou, or AI coding tools Cursor and Trae, their interfaces and System Prompts are just surface skins. What really makes Agents work is not only the beautiful UI, but also the real skeleton behind it – MCP, A2A, and inference models. This is not about “writing a smarter machine”, but about “building an intelligent system that can coordinate itself, coordinate the team, and complete the task”.
In the past decade, the mobile Internet has built “functional software systems”, but now the AI era is building “intent-driven intelligent systems”. I privately think,MCP + A2A + Reasoning Model is not a wild theory, but a system base and new computing logic that are gradually emerging in the AI era.
Just like around 2000, “HTTP + search engine + personal computer” was regarded as the future of the Internet, but in the end, the combination of “mobile Internet + cloud computing + big data” rewrote the world. The “future” of the current AI era also requires innovation in the underlying protocol (MCP-like), interaction paradigm (A2A-like), and terminal form (inference model-like), and perhaps the final form may deviate from expectations due to technological mutations (such as AGI breakthroughs), but it will not affect my expectations for its arrival!
Of course, we are still in the early stages of the AI era. Today’s agents are not smart enough, stable enough, or reliable enough. But that’s why it’s all the more important for Builders who are trying to rebuild the base and explore engineering optimization.
There is still too much uncertainty in the future. But one thing is certain: the endgame Agent system is no longer the winner of the UI fight, but a builder with MCP, A2A, and Reasoning Model.
Finally, may there be not only “users” but also “builders” on the road to AGI.
Related links
- 《Model Context Protocol (MCP) 》:https://modelcontextprotocol.io/specification/2025-03-26
- Agent2Agent (A2A) Protocol Release: https://developers.googleblog.com/zh-hans/a2a-a-new-era-of-agent-interoperability/
- 《Claude 3.7 Sonnet》: https://www.anthropic.com/claude/sonnet
- 《This Missed OpenAI Update Just Changed AI Agents Forever…》: https://www.youtube.com/watch?v=emiZkLvBzhQ
- 《Agent2Agent Protocol (A2A), clearly explained (why it matters)》: https://www.youtube.com/watch?v=mDcULe4GMyA
- 《MIT EI seminar, Hyung Won Chung from OpenAI. “Don’t teach. Incentivize.”》:https://www.youtube.com/watch?v=kYWUEV_e2ss
- “The Art of Dragon Slayer|General Agent+MCP=2025 Consensus on Domestic AI Products”: https://mp.weixin.qq.com/s/8gHrbplZ_JxKKriERw5dsA
- “Ai Teasing Pen|Detailed Explanation of MCP Core Architecture”: https://mp.weixin.qq.com/s/uTsr06MnJ9t3sGDzLD99_g
- “Kong’s low-dimensional cognition|ChatGPT is an agent”: https://mp.weixin.qq.com/s/AfpQmLLiEn85-S93fNv4nQ
- “Zhang Wuchang|The Most In-Depth on the Internet|50,000 Words Interpretation of Coding Agent & OpenAI o3”: https://mp.weixin.qq.com/s/GrYUEBPOvFNC0Wwd7T35cA
- “Miss M’s Study Record|A Different DeepSeek R1 10,000-Word Appreciation: A Wonderful Exploration, the Beauty of Creation Behind the Technology”: https://mp.weixin.qq.com/s/GUyTrnxw1WkwBc-TCC7kNA
- DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning (DeepSeek-R1: Stimulating the reasoning capabilities of large language models (LLMs) through reinforcement learning): https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf
- “Liu Yanfeiyu|9 Manus Measured Cases: Eyes Shine, and There Are Many Problems”: https://mp.weixin.qq.com/s/7Irb7TXkQmCSDe7GBo5vnQ
- How effective is Byte’s AI Agent? 9 Measured Cases”: https://mp.weixin.qq.com/s/rXr83otDjPnbKZETleJ2lA
- “Wei Shijie|Business Talk|34. 3-Hour Interview with Fellou Founder Xie Yang: Loneliness, Post-95s, Tables and Productivity: https://www.xiaoyuzhoufm.com/episode/680b04ea7a449ae85895ba00
- My friend Xie Yang, his Fellou, and the entrepreneurs of this era: https://mp.weixin.qq.com/s/rF0kMikeTjwfZ22l-x2KSA
- “Dia, the super popular AI browser Dia, I see the future iPhone moment of the browser”: https://mp.weixin.qq.com/s/8nQAXDSTvnFd4GULxnxRzg
- “Geek Park|The best AI application developers are making AI browsers”: https://mp.weixin.qq.com/s/qfMFYjQzANmgdYt9YqWm_g