90% are eaten by large models, and the dilemma of AI Agents

In an era where large models are sweeping everything like a tide, AI Agent should have carried the important task of evolution from model to capability, but it encounters multiple dilemmas in practical applications, such as blurring functional boundaries and out-of-focus scene definitions. This article attempts to dismantle why AI Agents cannot escape the fate of being “eaten”, and re-examine their true value and breakthrough opportunities in the technology chain from the perspective of product thinking.

“90% of agents will be eaten by large models.”

On July 15, Zhu Xiaohu, managing partner of GSR Venture Capital, said surprisingly as always, this time bombarding the hottest agent in the AI circle in the past year.

Halfway through the “Year of the Agent”, it seems that most of the recent comments are pessimistic judgments and information. Just last week, a series of developments such as the relocation of Manus’ headquarters to Singapore, the layoff of 80 employees in China, and the abandonment of the domestic version have also made the public start to discuss, what happened to Manus?

Behind it is a series of overseas models such as Gemini and Claude, which are led by BenchMark, which is a US dollar fund, and the underlying model includes a series of overseas models such as Gemini and Claude, coupled with rumors that it has fallen into a shortage of computing resources, and the departure of Manus has been confirmed to be a transfer and adjustment forced by the situation, rather than a retreat caused by business failure.

However, around the general agents led by Manus, the dark clouds above their heads have not yet dissipated: on the one hand, the decline in revenue monetization of Manus and Genspark, and on the other hand, the decline in user activity.

This situation reveals the core problem of the current general agent track: after the technology boom and capital carnival, the product has not yet found a killer application scenario that can make the majority of C-end users continue to be “loyal” and pay for it, and can only be occasionally used to make a semi-finished PPT and find a few reports.

The general agent market is being eroded by the spillover of model capabilities and is also being robbed of share by vertical agents.

Moving overseas, what happened to the Manus?

General Agent, in an awkward situation.

In a few months, the amazement of the birth of the general agent is gone: in the enterprise, it cannot be compared with the accuracy of vertical agents; When it comes to an individual, it does not find a scene that hits the user’s needs more.

The improvement of model capabilities first “slashed” the agents.

With the rapid development of large model capabilities, the model itself is becoming more and more “agentized”, and with the overflow of model performance, users can directly call the model to complete the task.

What does a product manager need to do?
In the process of a product from scratch, it is not easy to do a good job in the role of product manager, in addition to the well-known writing requirements, writing requirements, writing requirements, there are many things to do. The product manager is not what you think, but will only ask you for trouble, make a request:

View details >

Taking the current faster AI code as an example, Anthropic’s Claude and Google’s Gemini series models, the coding capabilities of the models themselves are improving with updates, and their self-developed coding tools (such as Claude Code) can not only achieve independent programming and optimize various product experiences, but also its Max membership model also allows users to call their own models at will, even if Opus charges $75 per million output tokens 4. $200 per month also supports unlimited use.

Compared with the $199 monthly payment model of Manus’ most expensive Pro member, although the price is close, Manus’ highest-end members are still consumed by the points system, and Pro members can get daily points in a single month + 19,900 points in a single month + 19,900 points for a limited time, relying on the task consumption points system to serve. According to the estimate of 100 points for a single task, the number of uses is about 10 times a day.

The cost problem that restricts Manus is passed on to users, which is the high subscription price that cannot be eliminated.

When the model itself can provide an experience close to that of an agent, users will naturally prefer to use cheaper and more convenient model APIs or conversational interfaces rather than paying extra for a generic agent product with overlapping functions. This has led to part of the market share being directly “eaten” by increasingly powerful foundation models.

For users, compared with vertical agents, the application effect of general agents on the enterprise side is not good, and it cannot reach the height of “digital employees” in terms of efficiency/performance measurement.

Zhu Xiaohu said that “90% of the Agent market will be eaten”, but his GSR Venture Capital has also participated in the financing of AI Agent projects, but compared with General Agent, he is more optimistic about products that can truly run out of efficiency and actual landing.

Head AI (formerly Aha Lab), invested by GSR, is a company that relies on AI Agent for automated marketing, and has now been upgraded to an AI marketing product. In the words of the founder, just tell Head your budget and website, and it will automatically handle influencer marketing, affiliate marketing, and cold email — one person solves a marketing department.

For enterprise users, accuracy and cost are core demands. However, general-purpose agents are currently not comparable to vertical agents optimized for specific scenarios.

If the same task is handed over to the general agent and the vertical agent within the enterprise, the former can only rely on the search engine to give results based on the needs, while the latter will be connected to the knowledge base built within the enterprise, and output according to the internal information to fit the needs, which is equivalent to “tying” a more sufficient database on the latter, and the result is self-evident.

When introducing new technologies, enterprises have extremely high requirements for cost and risk control. Generic agents are usually based on large and complex “black box” models, where the decision-making process is opaque and the output results have a certain degree of randomness (i.e., the “hallucination” problem). Enterprises with higher accuracy requirements obviously cannot accept the unstable output quality of generic agents.

An agent developer told Lightcone Intelligence that enterprises usually need to deeply integrate agents with internal knowledge bases and business process systems, and some simple tasks will be carried out through workflows to ensure accurate task execution.

Sandwiched between large models and vertical agents, general agents are divided into a large piece of the cake by the two.

There is no scene, waiting to evolve, and the Agent has made a start

In the case of “not being able to use it”, users are no longer enthusiastic about the general agent.

This also leads to the fact that C-end general agents represented by Manus are facing the dilemma of slowing growth or even regression.

Although from a commercial point of view, General Agent does show enough to attract money. Represented by general agents such as Manus and Genspark, the monetization results in recent months have proved the potential of this track: extraordinary industry research data shows that in May this year, Manus has reached $9.36 million ARR (annualized revenue), and Genspark has reached $36 million ARR in 45 days of release.

However, after a short-term increase in traffic, general agent products have more or less experienced a decline in traffic and revenue.

In June, Manus had 17.81 million visits, down 25% from 23.76 million visits when it was released in March; Genspark’s visits are also in a state of fluctuation, with 8.42 million visits in June, down 8%, and Kunlun Wanwei Tiangong super agent down 3.7%.

Also in June, two products with outstanding commercial performance, Manus and Genspark, experienced varying degrees of revenue decline. According to extraordinary industry research data, Manus’ MRR (monthly revenue) for the month was $2.54 million, down more than 50% month-on-month; Genspark’s MRR for the month was $2.95 million, down 13.58% month-on-month.

The above data illustrates,After a moment of popularity, the experience of general agent products does not attract users to continue to pay. At the same time, the frequency of user experience is also decreasing.

The reason for this is that Manus has not found enough Killer scenarios for users to continue to pay for.

At present, most general-purpose agents on the market are involved in several fixed directions: PPT, multi-modal capabilities, and writing reports (Deep Research), mostly focusing on scenarios related to office work. But for users, it is difficult for users to continue to pay.

Before finding the exact application direction, there are already a number of companies in the general agent track that rely on product launching tests first, intending to seize the market first.

In the case of unstable monetization and traffic, large manufacturers have limited energy investment in self-developed agents, and generally adopt the “two-handed” model, in addition to developing their own agent products, they are currently promoting their own agent development platforms. For example, Alibaba, Byte and Baidu promote their platforms while distributing benefits and organizing agent development competitions, focusing on the construction of developer ecosystems.

The market seems to have defaulted to the fact that generic agents are a business that small factories cannot afford.

It can be seen that in the domestic market, except for a few startups such as Manus and GensPark, most of the development companies of General Agent are companies with self-developed large models:

Among them, large manufacturers not only have models, but also have their own cloud support behind them. General Agent is not only a product, but also a mirror for them to demonstrate their capabilities through C-end products as a B-end platform, so as to attract more developers.

Large model startups are in line with the idea of “model is Agent”, and more models are studied at the model layer for the needs of Agents for RL (reinforcement learning), long text, etc., so that they have general agent products.

The charging model of domestic general agent players is also more voluminous than the way of going overseas.Large factories represented by Baidu and Byte have the ability to publicly test and provide services for free. For example, MiniMax, Tiangong Agent, etc. are open in the form of limited use or purchase of points. Compared with the free and unlimited means of large manufacturers, the domestic general agent track is destined to become more and more voluminous, and commercial monetization is a road with no benefits.

For document agents, the complexity of calling tools is relatively low, and the cost of text generation is lower, which is a more cost-effective direction.

On the basis of developing in-depth research functions, various agents have begun to make efforts in multi-modal capabilities and application scenarios. On the one hand, inserting multi-modal capabilities such as pictures and videos into the generated documents, and on the other hand, implanting scenarios that are currently compatible with the Agent into the general Agent, such as PPT, has almost become the standard configuration of office Agent.

But whether it is used to make a report, then use graphics and text to add icing on the cake, or use Agent to make PPT, it cannot solve the problem of average Agent output effect. For example, in an in-depth research report, the most common thing for Agents to appear is the error and omission of factual information retrieval, such as the inability to clarify the concept of Agent and recommend large model products.

A further problem is that the output information is not of high value.A report has only 3-4 sources sporadically, and more content is filtered from the Internet, often only ambiguous “nonsense”. For example, if it is asked to introduce the survival challenges of large model companies, it will list all the possible problems of opening a company, which is neither targeted nor has valuable incremental information.

As a result, enterprises began to explore more scenarios that Agents could match, trying to attract more users to participate. Agent will inevitably become the “collection entrance” of its own products in the future, and the company will integrate its own product capabilities in various ways, such as MiniMax integrating the ability of conch video to generate video, and Baidu Xinxiang accessing the original agent dialogue in the scene.

In addition to not being able to find suitable scenarios, the current agent capabilities are limited, the effects are different, and it is difficult for users to pay for it.

The general agent generally executes tasks by disassembling the task and then following the steps. The more complex the task, the more processes the agent performs, and as long as there is a problem with the result of any step, the overall output result will be of poor quality. Therefore, for complex tasks, the current Agent execution is not stable enough.

For example, if you output an analysis of a company, you need to capture financial report information, introduce the company’s website to the analysis and comments of major sources, and if any of the results are wrong, the analysis quality of the overall report will be greatly reduced.

At present, some agent developers are trying to break through these bottlenecks through technological innovation.

For example, MiniMax uses the new linear attention mechanism released at the beginning of the year in the new model M1, and its agent products use the M1 model as the base model. The advantage of this is that it greatly expands the amount of text that the agent can carry, supports 1 million contextual inputs, and is better for scenarios that require a large amount of text analysis, such as legal documents.

The dark side of the moon emphasizes “model is agent”, and its base model is a new generation of agent model trained by the dark side of the moon based on end-to-end autonomous reinforcement learning technology. Among them, RL (reinforcement learning) has become the highlight of this in-depth research agent.

Most people in the industry have affirmed the importance of RL to Agents in the communication with Light Cone Intelligence. Compared with traditional supervised learning or pre-trained models, they perform well on specific tasks, but their generalization ability is often limited by the distribution of training data. When the task scenarios that agents need to handle are diverse and the environment changes dynamically, it is difficult for agents that rely only on preset rules or one-time inference to adapt.

For example, in dealing with some tasks that require multiple processes to complete, the traditional model may have speculation problems in any link, which will affect the final result, but RL relies on a lot of trial and error and reward mechanisms to improve generalization capabilities, and performs better for complex tasks that require multiple steps.

Kimi-Researcher proactively deals with contradictory information

It can be said that RL can greatly increase the upper limit of an agent’s ability.

Kimi-Researcher researcher Feng Yichen shared that on the Humanity’s Last Exam (HLE, a test that measures AI on difficult problems in various disciplines), the =gent model score jumped from the initial 8.6% to 26.9%, compared to OpenAI Deep The research team improved the results from about 20 points (o3) to 26.6 points in related work, further proving the great value of reinforcement learning in agent training.

When the ceiling of technology is still high enough, latecomers are raising the standard of Agent’s ability. Today (July 18), OpenAI’s general agent product ChatGPT Agent ran a beautiful result, achieving a new SOTA score of 41.6% in the HLE test performance.

Through reinforcement learning, agents are expected to evolve from simple “tool callers” to agents with true “autonomous learning” and “environmental adaptation” capabilities. At that time, General Agent may be able to truly find killer scenarios and make users willing to pay for them.

Agents still have a long way to go, and only by relying on technological breakthroughs and deep cultivation of scenarios can they become AI assistants that can really help.

End of text
 0