Agents, as a new form of artificial intelligence, are changing the paradigm of human-computer interaction and becoming a strong contender for the next interaction entrance. This article will delve into the “magic” of agents, analyze the reasons why major manufacturers are betting on agents, as well as the application potential and future development trends of agents in different scenarios.
With Deepseek and Manus becoming popular all over the Internet, more than half of the hot spots in the technology circle are related to large models and agents.
At the Kukai Spring Conference on April 22, the super agent composed of six agents: audio-visual, health, life, equipment, creation, and education was officially unveiled;
At the Baidu AI developer conference three days later, Robin Li successively released a variety of AI applications such as the general super intelligent body heartbeat APP and the content operating system Cangzhou OS;
The Sequoia AI Summit held in mid-May unexpectedly listed “agents” as the core topic, admitting that AI has a market potential of “10 times that of cloud computing”;
and the subsequent developer conferences such as Google I/O 2025 and Microsoft Build 2025 all mentioned agents, covering programming, healthcare, finance and other industries…… Whether it is overseas giants such as Microsoft, Google, OpenAI, domestic companies such as Alibaba, Tencent, Baidu, and Kukai, or capital institutions represented by Sequoia, they have all begun to vigorously promote agents.
B-end product manager’s ability model and learning improvement
The first challenge faced by B-end product managers is how to correctly analyze and diagnose business problems. This is also the most difficult part, product design knowledge is basically not helpful for this part of the work, if you want to do a good job in business analysis and diagnosis, you must have a solid …
View details >
The corresponding question is: what exactly is an agent, why are “big factories” competing for agents, and what changes will it bring?
01 The “magic” of agents: the next interactive entrance
Before starting the discussion, take a moment to understand the concept of “agent”.
Agent is an English AI Agent, in which the meaning of Agent has the meaning of “agent”, which makes a qualitative difference between agent and conversational AI: it is no longer limited to question and answer, but is an intelligent application that can think deeply, plan independently, make decisions and execute deeply.
The scene cannot be said to be unattractive. But if you want to unravel the reason for the popularity of agents, you need to find another perspective – why do enterprises and consumers need agents?
The most important thing in the popularization of any technology may not be how high the upper limit of ability is, but how low the threshold for application is. If only engineers can call it, experts can configure it, and a few people can understand it, no matter how powerful the ability is, it can only stay in the “miracle in the laboratory”.
Comparing the evolution of large models and cloud computing:
The training and inference of large models require huge computing power and underlying architecture optimization, similar to IaaS in cloud computing, which plays the role of the “engine” of the agent, but is far away from the business and users;
The capabilities and API packaging of large model platforms, including MCP tools, plug-in systems, development interfaces, etc., correspond to PaaS, providing a unified “toolbox” for AI development and calling;
The agent closest to users and business scenarios can be regarded as a form of SaaS through the integration of capabilities, understanding of intentions and task execution, providing “buy-and-use” intelligence.
Taking the To B scenario as an example, traditional enterprise systems have many functional modules and complex interface logic, and usually require system training and mastering business rules to successfully complete a process. Companies invest a lot of time and money just to “get people to get used to the system”.
When the agent has the ability to understand, reason, and execute, users do not need to face complex interfaces or understand the internal logic of the system, and only need a command in natural language, and the agent can automatically identify the intention, call system resources, complete the task link, and output the results in the form of charts, text, or notifications. From human adaptation to the system to AI adapting to human needs, productivity will be greatly improved.
Another example is the To C scene, in the past, if users wanted to watch a movie, they needed to use the remote control to enter the title of the movie to search. Sometimes I can’t remember the title, and I have to search for keywords on my mobile phone first, check dozens of links to find the title, which almost exhausts the mood of watching the movie.
If it is a TV equipped with a cool super agent, you only need to say “what movie do I want to watch” with your voice, even if you don’t remember the title, simply describe the plot and characters in the movie, the super agent understands the needs of the user, and after disassembling the task, it is assigned to the audio-visual agent to search for the content on major video websites, and one step directly to the playback interface. Even in the AIOT home scene, after the agent receives the demand to watch a movie, it can automatically perform operations such as adjusting the lights and closing the curtains.
There are many more examples that can be found.
In addition to disrupting productivity, agents have further changed the paradigm of human-machine collaboration:Instead of actively operating the tool, users simply issue instructions to let the agent complete a series of complex tasks. Whoever can become the first successor of user needs can control the scheduling power of the system and control the allocation of resources.
For AI companies, agents carry the next entry-level opportunity, and the layout of agents is to seize the “control” of next-generation interactions.
02 On the eve of the outbreak of agents, three “schools” emerged
It is undeniable that the agent at this stage is still in its infancy.
However, under the two-wheel drive of technology iteration and market demand, more and more enterprises are participating. Because of different entry paths, different understandings of the value of agents, based on their own core advantages and resource endowments, three distinct camps have gradually differentiated.
The first group is AI manufacturers in the standard sense, such as Baidu, ByteDance, Google, OpenAI, etc., trying to dominate the construction of the technology ecosystem.
Their approach can be summarized as follows: based on large models, open up agent development toolchains and solutions, and attract developers to build various agent applications on the platform. The goal is to build an AppStore for the agent era, allowing agents to be created, called, and distributed like apps.
Under this concept, the agent is no longer a product, but a new “operating system”, hoping to play the role of infrastructure builder and ecological leader in the link of “model-development-distribution”. After all, whoever has the most powerful development platform and the most active developer ecology will have the “distribution rights” and “scheduling rights” in the AI era, which can be said to be the most fascinating and difficult business.
The second group is enterprise service providers that focus on vertical scenarios, such as Microsoft, IBM, Alibaba Cloud, etc., which are starting to build enterprise-level agent solutions.
Most of this camp comes from the field of cloud computing and enterprise services, has a deep understanding of industry know-how and enterprise architecture, and is not in a hurry to create an “entrance for the public”, choosing to start from the most realistic vertical scenarios, focusing on the delivery ability and effect verification of agents.
Therefore, in terms of play, it tends to integrate the capabilities of agents into the original system process of the enterprise to solve the automation and intelligence problems of business modules such as finance, sales, human resources, and warehousing. Microsoft has a bold judgment on this: as more agents join, every employee will become an “agent supervisor,” responsible for building, delegating, and managing agents to maximize their capabilities.
The third faction is software and hardware manufacturers that are well versed in the pain points of user experience, such as Huawei, Lenovo, Kukai, Samsung, etc., which directly implant agents into user “touchpoints”.
The number of users at every turn has made software and hardware manufacturers on the front line of user experience for a long time, and has natural advantages in meeting user needs, software and hardware polishing and data accumulation.
A direct example is Kukai, which launched a smart screen with AI functions as early as 2014. In 2025, it will take the lead in proposing the standard of “long memory, fast thinking, and second action” for “super agents”: it can form an “experience library” in the process of user use, so that the model can better understand user habits and reduce the cost of repeated interaction. At the same time, atomized components and multi-agent collaboration frameworks are used to increase the response speed to less than 1.5 seconds, meeting the end users’ requirements for a “faster, more accurate, and more direct” experience.
The above classification may not be rigorous, just like Alibaba also has a layout in the To C direction, and Kukai is also expanding into the B-end market.
The reason for the adoption of the three major schools lies in the fact thatThey constitute the triangular architecture of the agent ecosystem – platform, service and experience, starting from the technical ecology, industry adaptation and terminal scenarios, there is both competition and collaboration, and together promote the agent from concept to implementation to large-scale application.
03 Fanaticism and rationality coexist, and the possible trend of agents
The resonance of multiple forces has made agents the most imaginative outlet at present. But historical experience tells us that tuyere and bubbles often follow each other.
After Manus unexpectedly became popular, first-tier manufacturers quickly followed up and “concocted” the same product in less than a month. There are also hidden worries under the heat: many “agents” are just simple encapsulations of large model APIs, lacking core capabilities such as task orchestration and long-term memory.
But this cannot deny agents.
At the beginning of each new technology cycle, there is more or less a phenomenon of “bubble first”, and the market chases the concept faster than the technology itself, resulting in short-term value being overvalued, long-term value being seriously underestimated, and finally spiraling forward in the contest between fanaticism and rationality.
On the eve of “the concept is clear and the path is not unified”, we try to “speculate” the next possible trend of the agent from the perspective of rational thinking.
1. Vertical agents will land earlier than general.
The problem with general-purpose agents is generally that they are “strong but not specialized”. In contrast, vertical agents that are close to the business, familiar with the process, and have clear target boundaries and industry knowledge graphs have initially met the requirements of “being able to work” in scenarios such as healthcare, education, hotels, and manufacturing.
One of the challenges that arises is that a single agent can perform simple tasks, and once the task chain is a little more complex, it must rely on multiple agents to work together.
For example, in daily life, tasks such as travel planning, food recommendations, and hotel reservations may be involved, and it is necessary to accurately understand the user’s intention after the user gives instructions, disassemble the requirements, and assign them to different agents to complete. At present, only Kukai’s super agent has demonstrated intelligent integration of home services, and most of the remaining agents are still in the stage of manually calling individual agent conversations.
When users put forward complex needs such as “help me plan a 3-day trip for a family of 5 in Shenzhen”, the agent can link weather, transportation, food, hotels, attractions and maps and other services in one stop, formulate a detailed travel plan, and directly choose the appropriate air tickets and hotels, and you can directly scan the QR code to buy tickets.
Integrating capabilities such as personalized intention recognition, dynamic task orchestration, and multi-agent collaboration may become the first race point of the agent marathon.
2. The opportunity to belong to hardware may be greater than that of software.
The current discussion about agents is mainly about the reconstruction of software form: from tools to assistants, from applications to agents. A more noteworthy phenomenon is that the influence of agents on hardware may be far greater than that of software. But when the agent begins to dominate the interaction logic, the hardware itself becomes the “entrance to the service”.
It can even be foreseen that natural language-based interaction will reshape the voice of hardware, and every screen may become a “service hub”.
There is already a similar trend on smart speakers, where users only care about the results, not the content of which platform the smart speaker plays. With the empowerment of agents, the delivery power of services will be further transferred from the APP to hardware with perception and understanding capabilities:
TVs, girlfriend machines, etc. are no longer just playback tools, but the AI control center of a family; The ability of the learning machine is no longer limited to correcting homework and video courses, the “long memory” of the educational agent accurately records the child’s learning trajectory, “fast thinking” analyzes weak points in real time, and “acts in seconds” to generate personalized plans, truly realizing the AI education paradigm of “thousands of people and thousands of faces”……
It should be noted that the above is just a shallow opinion after studying the agent strategies of Microsoft, Lenovo, Kukai, IBM and other companies.
However, it can be certain that the agent will not be a single product, but a comprehensive reconstruction of technology, interaction, and service methods, from the “universal engine” of the general large model, to the “industry brain” of the vertical agent, and then to the “intelligent entrance” of the hardware terminal, the structural upgrading of the AI industry has quietly begun.
04 Write at the end
There are many, many difficulties left for agents.
whether general agents can break down silos and form a sustainable open ecology; whether vertical agents can identify application scenarios and move from model rooms to large-scale deployment; And how to set the boundaries of human-machine collaboration, how to balance data security and personality privacy, and whether the collaboration mechanism between multiple agents can be as efficient and orderly as real organizations…… They are all “ability hurdles” that agents must overcome to the main stage of the industry.
When these questions are answered one by one, AGI will no longer be far away.
Borrowing from the consensus at the Sequoia AI Summit: The victory of the AI era belongs to those who not only deeply cultivate vertical scenarios, build moats, but also maintain agile iterations and embrace the technological wave.