Alibaba, Byte, DeepSeek, Step Star and Zhipu form the “top five new basic models” in China, making efforts in open source, reasoning and multimodality to promote the diversified exploration of the commercialization of large models. This article will provide an in-depth analysis of the competitive situation, technological breakthroughs and commercialization trends of these five companies in the field of basic large models, explore how they occupy a place in the fierce market competition, and look forward to future development directions.
The big model company is conducting a new round of qualifying.
AI applications are in full swing, but under the “invisible” surface, the war of basic large models continues. Instead of slowing down as expected, it has become more “voluminous”:
According to incomplete statistics from Lightcone Intelligence, since the beginning of 2025, major large model companies such as Baidu, Byte, Alibaba, DeepSeek and “Six Little Tigers” that have participated in model competition in China have released more than 45 basic large models (excluding industry vertical large models), which is equivalent to a new model release every 3.3 days.
In the first half of the year, the first echelon of domestic large model companies is infinitely close to the pattern of the first echelon overseas.
Compared with OpenAI, Google, Anthropic, X.AI and Meta, which are combined into the overseas “top five” of three major manufacturers + two startups, the domestic large model team is also differentiating into a “3+2” combination of Alibaba, Byte, DeepSeek, Step Star and Zhipu, becoming the “top five new basic models” in China.
The gods fight, and the “top five new infrastructure” will issue 30 large models in half a year
In the first half of the year, the field of basic large models was still a situation of “fairy fighting”.
According to the statistics of the number of models released by the domestic “top five basic models” companies in the first half of the year, the five companies released a total of 32 large models. In terms of the number of releases, Alibaba and Leap Star are the two outstanding “volume kings”, of which Step Star released 11 models and Alibaba released 9 models, adding up to more than half of the five companies.
Combined with the development trend of the large model industry, open source, reasoning and multimodality have become three key words:
Judging from the timeline, among the five companies, Alibaba, Zhipu and DeepSeek have long been firmly on the open source route.
Among them, Alibaba is the unique “open source volume king”, and since 2023, Alibaba has released many of its models in the open source community. In terms of the number and type of open source, Alibaba is the most comprehensive one.
This also makes Alibaba an open source SOTA provider in many fields. Qwen3, a hybrid inference model launched on April 29, has become the world’s most powerful open source model, and its cost is only 35% of DeepSeek-R1.
Of course, before Qwen3, the last company in China that set off a craze for reasoning models was “Catfish” DeepSeek.
What does a product manager need to do?
In the process of a product from scratch, it is not easy to do a good job in the role of product manager, in addition to the well-known writing requirements, writing requirements, writing requirements, there are many things to do. The product manager is not what you think, but will only ask you for trouble, make a request:
View details >
Compared with the above four companies, DeepSeek is the only “professional” that does not do large model matrices. For example, the inference model DeepSeek R1 released during the Spring Festival has achieved extremely low cost while full performance, and the input cost is as low as 2% as long as GPT-4o, which is out of the circle in one fell swoop and has become the “benchmark” for subsequent inference models.
Multi-modal large models are the key direction of research and development of large model companies this year.
For example, Step Star, known as the “King of Multimodal Volumes”, has released a total of 22 basic models, including 16 large models in the multimodal field.
Among them, Step Leap Xingchen and Geely’s open source Wensheng video model Step-Video-T2V became the open source video generation model with the largest number of parameters and the best performance in the world at that time, and Step-Audio was the first product-level open source voice interaction model in the industry.
Byte, which has a comprehensive layout, has also begun to join the first echelon in the multimodal field this year. In the case of Seedream 3.0, in addition to improving image quality and generation efficiency, it also improves the application capabilities of AI image generation in the field of commercialization. As a result, the Wenshengtu effect of its corresponding product, Dream AI, was once out of the circle.
However, compared with large language models, the development of multimodal large models in the industry is far from enough. In Jiang Daxin’s words, “In the field of multimodal models, there is no GPT-4 moment yet.” ”
In Jiang Daxin’s view, its stuck point lies in the lack of an integrated architecture for understanding and generation in multimodal fields in the entire industry. Large language models have done this, but the understanding and generation of multimodal large models are still done by different models. For computer vision, this is a stubborn problem that has not been solved for decades.
Compared with the large language model ChatGPT and the reasoning model DeepSeek-R1, the multimodal field still has a chance to wait for the next popular model.
What is the only way to AGI?
In today’s world where AI applications are in full swing and the marginal benefits of Scaling Law are decreasing, is it still worth betting on basic large models?
100 days ago, the release of DeepSeek undoubtedly gave the industry an extremely definite answer.
“Everything changes too fast, and every morning, the release of new models and new products can subvert the past cognition.” An entrepreneur sighed to Light Cone Intelligence.
As we mentioned above, in the past six months, the speed of model releases has not decreased but increased. At the same time, the technical dividends brought about by the improvement of model capabilities have brought DeepSeeks to the forefront, while companies without technical advantages have lost the opportunity to continue to stand in the first echelon when their investment enthusiasm has receded.
In order to stay at the table for a longer time, whether it is a large factory or a start-up company, grabbing money and talent is still the main theme of 2025.
Today, computing power, talent and capital are still the three hard indicators that measure the standing of large model companies on the table. For large factories, capital is naturally not a problem, but for startups, it is necessary to pull enough investment to pay for the company’s early research and development.
In the start-up camp, only the “Beijing Team” Zhipu and the “Shanghai Team” Step Xingchen, two companies that are favored by state-owned assets, can continue to gain the favor of capital during the cold winter of large model companies.
In terms of Zhipu, it won investments from three state-owned assets in Hangzhou, Zhuhai and Chengdu in March, with a total of 1.8 billion yuan; In December last year, Step Xingchen received hundreds of millions of dollars in funding and completed Series B financing.
From the perspective of talents, the current “top five basic models” are showing a siphon effect on talents. In terms of Byte, from 2023 to 2025, the company has poached a number of R&D backbones at home and abroad, such as Wu Yonghui, who was once the vice president of research at Google DeepMind, and joined Byte this year to become the head of basic research of the large model team Seed.
On the basis of a large amount of accumulated funds and talents, the top five basic models mentioned above have gradually established advantages: “Open Source Volume King” Alibaba uses ecology to win over B-end users, Byte while supplementing the territory of the basic model, while relying on bean bags, buckles and other applications to feed back model upgrades, DeepSeek has become the king of cost performance and low price, and the large model of Zhipu has obvious advantages on the government and enterprise side, and Jilei has become the “multi-modal volume king” and issued a variety of SOTA models.
The goals of the above-mentioned companies are to continue to improve the “intelligence ceiling” of large models and support the breakthrough of AI applications with overflowing model capabilities.
In the case of agents, their key capabilities lie in multimodality, slow thinking, and memory ability.
With multi-modal understanding capabilities, Agents with large models as the technical base can “read” and understand the information in mobile phones and computer screens, so that AI can also replace humans to operate intelligent terminals; The reasoning ability allows the AI to disassemble the task according to the user’s needs, follow each planned step, and finally complete the task.
Google DeepMind CEO Demis Hassabis believes that the path to AGI is beginning to become clear, but to truly reach this goal, it is still necessary to break through multiple technical bottlenecks and integrate multiple key capabilities.
In a limited time, whoever has more comprehensive hard indicators and stronger pedestal model capabilities will have the opportunity to truly get the ticket to AGI.
Large model commercialization trend: open source and vertical scenarios are implemented
Commercialization is a proposition that basic large model companies cannot skip over, and their commercialization strategies are often in line with the strategy of technology.
In 2025, open source and vertical scenario applications will become the two key directions for model commercialization.
Let’s talk about open source first, open source Chinese large models have occupied half of the world. At present, on the open source community HuggingFace, 12 of the top 30 popular models have come from Chinese companies, including the latest music model ACE-Step, DeepSeek’s R1 and Prover-v2, Alibaba’s Qwen3 series, ByteSeed’s small parameter code model and Tencent Hunyuan’s AI video model.
After open source, the commercialization methods that large model companies can try have become more diverse: China is represented by DeepSeek and Alibaba, which adopts a more open protocol, this model can generally take three models, the most intuitive is to call API to pay, cloud vendors charge “utility bills” by providing GPU services, in addition, customized adjustment and technical services around open source models are also a model.
However, there are only a few enterprises and individuals who can really use the open source model directly, and most people need a complete product that is “out of the box”. Therefore, the application of AI in vertical scenarios is becoming more and more popular.
The hottest is undoubtedly the agents of all walks of life, from government and enterprises, finance to medical care, agents can be described as ubiquitous. But at present, it is the combination of agents and smart terminals that is more popular.
Why has agent + intelligent terminal become the key landing direction?
“Cars not only have high-value software and hardware systems, but also have a close connection with users, which makes them ideal AI carriers.” Wu Huixiao, CTO of Great Wall Motors, said. Similarly, this rule applies to products such as mobile phones and embodied intelligence.
For manufacturers, it is the same as multi-modality and enhanced learning capabilities, and being an agent is also one of the cornerstones of the road to AGI for large model enterprises. In the five stages of AGI given by OpenAI, Agent corresponds to the L3 stage, that is, AI has the ability to operate autonomously, and on the basis of L3, AI can further pursue the ability to learn independently.
Therefore, for large model companies, the commercialization strategy of being an agent is a move extended from the basis of technology.
Step Star and Zhipu, the two large models of the north and south, are also aiming at the intelligent terminal track.
Last year, Zhipu launched AutoGLM, an agent that can run on mobile phones, allowing Agents to take over the deployment of various applications to meet user needs.
At the open day in February, the company released the application of Agent in four fields: automobiles, mobile phones, embodied intelligence, and IoT.
Today, various large model companies are grabbing orders from intelligent end customers. In the case of Step Star, this year it has obtained cooperation with OPPO, Qianli Technology, Geely Automobile Group, Zhiyuan Robot and other manufacturers.
The superposition of agent capabilities is also becoming a selling point of intelligent terminal products. For example, OPPO mobile phones Find N5 and Find X8 equipped with “one-click all-round search” and “one-click screen question” have sold well. It is reported that the Find X8 has become the highest-selling product in the same period in the history of the Find series.
Compared with other businesses, the cooperation of intelligent + intelligent terminals has also brought considerable income to enterprises. According to the “Intelligent Emergence” report, with the signing of large orders such as Samsung, the revenue has exceeded 100 million yuan less than a month after the Zhipu Festival.
From the perspective of the commercialization of this generation of AI large model companies, everyone is avoiding the privatization and customization of the traditional To B track, hoping to standardize the product as much as possible with the help of the dividend of technology, so as to achieve higher gross profit margins.
Whether it is a “self-service hot pot restaurant” that provides open source tools or a “private restaurant” that provides vertical agents, the prospects for the commercialization of large models are becoming more and more optimistic.