2025 The First Year of AI? After half a year, what agent can be used at present?

2025 is known as the first year of domestic AI applications, and many agent platforms and agent products have emerged in the market. This paper introduces common agents, such as Deep Research, Manus, and Buckle Space, and analyzes their characteristics and limitations. In general, general-purpose agents are still developing, and vertical field agents have gradually matured.

2025 is the first year of domestic AI applications, aside from the various products that various companies are developing themselves, there are two types of products that are relatively prosperous: Agent platform and agent.

The agent platform here is actually a low-code platform, which can efficiently generate a variety of simple agents (more appropriately called personal assistants).

The personal assistant here is very different from the agents that have been very popular on the Internet recently (such as Manus and DeepResearch), so the current definition of Agent is a bit broad, and it may need to be iterated to describe it more accurately.

Recently active agents include DeepResearch, Auto-GPT, Manus, Buttonspace, Lovart, etc.

At present, these AI products are “lively”, and their purpose is still Attention is all you need, on the one hand, they are competing for attention, on the other hand, they are occupying a new round of traffic entrance.

We introduced the Agent platform before, and today we will introduce the current common agents.

How can product managers do a good job in B-end digitalization?
All walks of life have taken advantage of the ride-hail of digital transformation and achieved the rapid development of the industry. Since B-end products are products that provide services for enterprises, how should enterprises ride the digital ride?

View details >

Agent Overview

The best material for a general description of current agents is the development prediction proposed by OpenAI founder outman:

There is only one underlying logic of this design: the model eats the application, they want to eat all the algorithms (workflows), data (knowledge bases), tool plug-ins, and the model is all, which also proposes the classic architecture of Agent:

In terms of module classification:

The core reason why Manus can explode is that the model capabilities have been greatly enhanced;
RAG solves the problem of hallucinations, the current development trend of the model, the model context is a matter of time before the model breaks one million, how to make the model talk like a person, experience the AI clone and other applications, will be born in the next two years;
The toolchain solves multimodal problems, including the recently popular MCP and Computer Use, which are actually an extension of AI’s multimodal capabilities, and what is needed is to solve various “not good” problems of AI, including hearing, vision, touch, etc.;

Based on this, looking at the common agents today, they can be divided into two categories:General agent, vertical industry agent。

Because the pedestal model is not very capable, everyone is the same, so:

For general-purpose agents,At its core is the tool ecosystem, the more prosperous the ecology, the easier it is to stand out;

For vertical industry agents,The more private corpus and vertical field plug-ins, the more friendly it is.

Taking Manus as an example, he actually has no technical threshold, there are many similar products in China, for example, its realization cycle is about a week, of course, it needs to be polished well, and it will take a lot of time.

Next, let’s introduce the following products.

Deep Research

OpenAI launched the Deep Research feature at the end of February, and the number of uses per month is limited.

In terms of performance, Deep Research can be called professional enough, he will disassemble complex tasks step by step like a human expert, and then conduct multiple rounds of information search and verification on the Internet.

He will gradually adjust his research direction and strategy based on the existing information, and continue to dig deeper into the essence of the problem until he finds the most suitable answer.

But it is consistent with agents such as Manus: after real use, he probably won’t want to use it a second time, and he is not mature.

My task was to sort out all the medical information, which was obviously a complex project, and to make it less difficult, I changed the problem: sorting out all the channels for publishing medical information.

Based on this question, the Deep Research journey began, first giving some input:

I want to complete the “Panoramic Grading System of Medical Information Output Channels”.

The purpose is to include all institutions that may produce medical information, and to classify them, it is necessary to follow the principles of MECE

PS: The real prompt will be much more rigorous, but it will not be released if there is a certain density

After multiple prompts and repetitions, GPT gave me the following feedback:

No need to look closely, there is a big problem, there is not even the most basic medical textbook…

Knowing that he had a problem, I began to repeatedly hint that there was an omission, but unfortunately the system did not give me a satisfactory answer.

In summary, after each question takes 5-30 minutes, it is difficult for Deep Research to complete the task independently for overly complex problems.

Based on this, let’s take a look at the domestic Manus:

Manus

Manus is actually quite successful, and the financing figures will definitely not lie: in April, Manus completed $75 million in financing, and recently seems to have received another $100 million in financing, with a valuation of $2 billion!

Because we said earlier that his technical threshold is not high, capital is actually very optimistic about companies that can understand AI and marketing.

For specific products, he will complete more functions than Deep Research, such as asking Manus to score the current Agent model and then produce a report that looks decent:

But after real use, there are many problems, here are three random points to say:

1. Less structure, more intelligence

The Manus-like agent takes the model, that is, all the routes, which roughly means: don’t interfere with me, I play by myself, this is a good wish, but it is very troublesome in terms of the current degree of completion, because it is not easy to accept input and output.

At present, Manus is based on Computer Use as an independent web page, which cannot be embedded in production environments such as DingTalk/Feishu, and users need to switch interfaces repeatedly, which is quite troublesome to use.

PS: But it actually doesn’t matter, because his output ability is not good anyway…

2. Frequent interruptions

It’s not just Manus, it’s also Deep Research, each task takes a long time (30 minutes is also common), but when you do leave and come back, it’s still maddening to find that the mission is interrupted due to the loss of context.

The community reported that its decision tree is prone to dead loops, repeated execution, or unresponsiveness for a long time, and the success rate is complained by users that it is less than 30%.

Let’s not talk about server stability here, it must be relatively poor…

3. Hallucination problems

Although the content generated by Manus often claims to be “marked source”, the citation links are missing or inaccurate during real inspection, and the reliability is insufficient.

Moreover, the landing position of the main body of the product and the computing power is not transparent, and there may be hidden concerns about cross-border storage and unclear legal jurisdiction… brief summary

There are some other problems, so I won’t expand it, but the flaws are not hidden, although Manus is a bit of a stitching monster, but maybe the stitching monster is the correct way to open the AI.

Its significance is still from L2 to L3, from chatbot to task completer.

Then, let’s take a look at the button space:

Buckle space

Byte can be called rich and handsome in AI applications, and they have formed an AI system: if you want to be an agent for POC verification, you must first click on the button; The button can’t solve the problem of multi-agent collaboration, so continue to come to a set of multi-dimensional tables; You need to make a knowledge base, go directly to Feishu knowledge Q&A, and immediately activate the Feishu document; There are also bean bags in terms of pedestal models; …；

Byte can be said to play the entire AI application ecosystem clearly, and the Douyin ecosystem provides a lot of traffic support, and many anchors have flocked to the button system, which leads to a very sound ecosystem.

On this basis, Buckle Space can really handle the whole process of the Agent, including task orchestration, MCP calling, result delivery, and based on Byte’s strong technical capabilities, it is quite cheap not to mention high stability…

The buckle AI ecology is a representative of accumulation, and ordinary foreign agents really can’t keep up…

But foreign manufacturers are also awesome, such as the video AI package displayed by Google I/O. Google has released three creator-oriented AI tools: Flow scripts→ storyboards→ soundtracks→ and dubbing; Veo 3 breaks the “silent era” of AI video with native audio tracks and physical details; Imagen 4 image generation, 2K resolution to keep the logo and text clear;

The combination of these three is like giving the creator a director, photography, and visual director:

A simple description is: I can make a short drama directly based on this…

Sequoia further pointed out that in the enterprise-level market, the real entrance to run out first is not necessarily the general large model, but vertical field intelligent OS such as Harvey (law) and Open Evidence (medical), because they can understand the language of the industry and understand the real needs.

Therefore, Manus is currently more eye-catching, but the real good ones are vertical fields such as Cursor and Lovart, which are deeply involved in applications, and even Lovart can be subdivided into advertising and architecture fields.

From here, we also shift our vision from general agents to vertical domain agents: unsetunsetLovartunsetunset

This year, AI on the image and video side has been open in various ways, and the Agent product Lovart in the design field performed very well a few days ago:

It’s similar to Cursor, it’s a designer’s productivity tool, and it’s really delivering results.

Logically speaking, if you develop Cursor and Lovart again, you can break the barriers of professional KnowHow, you only need to tell Lovart how to draw, what style, and then he will play the whole process by himself, such as some comic effects here:

Whether it is Cursor or Lovart, he marks the gradual maturity of agents in the vertical field, and on the other hand, he is also verifying the judgment of the Sequoia Summit: AI applications will first be carried out in the vertical field.

epilogue

There are also some other agents worth studying, such as many agents who write PPT, which are very mature, and here through the study of agents, we can actually draw a conclusion: general agents are not yet mature, while industry agents are reaching the usable level.

Rich Sutton, the father of RL, stated in his 2019 article “Bitter Lessons”:

The 70-year history of AI research tells us one of the most important lessons: a universal approach that relies on pure computing power always wins by an overwhelming margin

Coupled with the rapid improvement of model capabilities and the popularity of Manus, many people will think that the general capabilities of models are replacing the current complex workflows.

But I think this is not right, at least not in the past few years, because GPT is based on statistical logic and does not have the real ability to think

First of all, the implementation of AI products lies in two poles: model and engineering, the stronger the base model ability, the simpler the corresponding engineering implementation can be, but there is a dynamic critical point here. This critical point is: the model can not make plans, but it can really accurately extract keywords, which is a question of sexuality; Engineering can effectively make up for the inherent defects of large models, such as hallucinations and memory problems;

As far as I can see, the 20 companies I have seen are all based on Workflow in the implementation of AI products, and they have shown an indifferent attitude towards whether the model will completely subvert their prompt word engineering, the reason is: the company that stops at a shallow taste, the cost of the prompt word project is very low, more than 100,000 or 200,000 yuan will be done, and the model will be replaced, they don’t care; Companies that use the industry deeply are already very senior players in the field, and their prompt word engineering relies on a lot of KnowHow, and occasionally they can’t understand it, so they are not worried about the model will soon have something beyond their industry cognition.

It should be noted here that the so-called industry depth here is not only the programmer industry and the image industry, but also refers to the medical, financial, legal and other fields.

Returning to the above-mentioned general-purpose agents Deep Research and Manus compared with vertical agents Cursor and Lovart, perhaps you can also get the answer to the current best AI project practice path.

To sum up, I still agree with the Sequoia AI Summit: the opportunity for AI application lies in vertical fields.

2025 The First Year of AI? After half a year, what agent can be used at present?

Agent Overview

Deep Research

Manus

1. Less structure, more intelligence

2. Frequent interruptions

3. Hallucination problems

Buckle space

epilogue

JD.com vs. Meituan, Cudi won

Several variables affecting JD.com’s takeaway appeared at the same time

Exceeded expectations! Taobao flash sale opened up nationwide in advance, and joined forces with Ele.me to reverse the takeaway war

JD.com VS Meituan: The final deduction of the “takeaway war”

Why is a Hello bicycle more expensive than a bus?

Xiaohongshu Entertainment live broadcast sprints urgently, appearing in the background in early May, and the voice hall may appear, are you ready?

o3 In-depth Interpretation: OpenAI Finally Uses Tool Use, Is Agent Products Dangerous?

The Truth Behind AI App Hits: From Cursor to Arc, PMF’s Key Insights That Determine Life and Death

In-depth Interview Practical Guide: Say goodbye to awkward chats and superficial information, and dig into user treasures

How does AI programming choose the right large model? 4 stages + 6 recommendations

Research and analysis of channel change of car companies: research background and overview

0 yuan delivery, 10 yuan delivery, takeaway war “fighting” into colleges and universities

What is the Chain of Thought? What is the value? How to use it?

CF Crossfire’s new VVIP weapon will be launched on August 2nd! Landing will lead to the Mechanical Maze – Dome

That is, the dream finally won a bean bun