Domestic large model “top five”, decisive battle with AGI!

China’s basic large model market has completely changed! Today, the players at the table have become the “Top 5 Models” – Byte, Alibaba, Leap Star, Zhipu and DeepSeek. In the next pinnacle battle, where will the key winning point be?

The emergence of DeepSeek has completely changed the global AI landscape.

Since then, not only has the competition pattern of large models between China and the United States changed, but the industrial map of domestic large models has also been broken in one fell swoop!

Looking at the market of China’s basic large models, it can be seen that today’s basic large model territory has changed and evolved into a new top five pattern –

Byte, Alibaba, Step Star, Zhipu, and DeepSeek.

The top five of the new base model breakthrough, where is the next winning point?

Why can these top five stand out be the last players to stay?

The answer is simple – either there is money or there are people.

The former is self-explanatory. Training a large model is a clear bet, either you have to have your own food, or you have to have thighs.

Like Byte, Alibaba, and DeepSeek, they all belong to the type of grain; And the wisdom spectrum and the leap star undoubtedly belong to the latter.

To achieve these three challenges, product managers will only continue to appreciate
Good product managers are very scarce, and product managers who understand users, business, and data are still in demand when they go out of the Internet. On the contrary, if you only do simple communication, inefficient execution, and shallow thinking, I am afraid that you will not be able to go through the torrent of the next 3-5 years.

View details >

Among them, the latest round of financing of the Shanghai team occurred from the end of 24 to the beginning of 25, and the B round has raised hundreds of millions of US dollars; The Beijing team’s Zhipu has won 1.8 billion yuan in financing in March 2025.

When it comes to people, of course, it is high-density talents, especially industry-recognized technical leaders.

If we look closely, we can find that the top five are divided in this regard, each with its own backbone.

Byte’s Wu Yonghui, Alibaba’s Wu Yongming and Zhou Jingren, Jiang Daxin, Zhang Xiangyu, and Zhu Yibo of Step Star, Tang Jie and Zhang Peng of Zhipu, and Liang Wenfeng of DeepSeek are all figures in the industry who can shake up the situation.

Satisfying the rich and having people, in this regard, all the starting lines are similar, and the next thing to compete for is hard goods.

The top five basic models, each leading the way

In fact, if you analyze it carefully, you can find the commonalities of these top five.

Either it is an all-rounder, and the model’s capabilities must be fully covered, and the performance must be in the first echelon. Either they are professionals, and the model is far ahead in a certain aspect.

Alibaba: The king of open source, the third in the world

With its unique positioning as the “King of Open Source”, Alibaba not only occupies an important position in the domestic market, but also ranks among the top 3 model contributors in the global AI open source ecosystem.

Stanford Artificial Intelligence Index Report 2025

It can be said that Alibaba is the most open LLM Internet giant in China, and it is also the only cloud computing vendor in the world that realizes “full-size, full-modal” open source.

It can be said that as the earliest and most complete open source company, Alibaba is the most resolute in investing in AI, and it is also the most complete layout and the first Chinese Internet company to make money.

Since 2023, the Tongyi team has accumulated 200+ open source models, covering two base series: the Qwen large language model and the Wan visual generation model.

These models cover full modalities such as text generation, visual/speech understanding generation, text graphs, and video generation, with parameters ranging from 0.5B to 235B, spanning 119 languages and dialects.

Two years ago, the domestic LLM market was still in the stage of “100 models” melee, and Alibaba took the lead in open sourcing Qwen-7B, attracting the attention of developers around the world.

Up to now, Qwen has been downloaded more than 300 million times worldwide and the number of derivative models has exceeded 100,000, surpassing Llama to become the world’s first open source model.

In the Hugging Face community, the Qwen series accounted for more than 30% of global model downloads in 2024, ranking first.

In February 25, in the Hugging Face global open source large model list, the top ten open source models were all based on Qwen’s secondary development

In this “money-burning” game, in the next three years, Alibaba will also invest 380 billion yuan in AI research and development, for cloud and AI hardware infrastructure, a total of more than the last ten years combined.

This investment scale is second to none among domestic Internet companies, which just shows Alibaba’s strategic determination on the AI track.

Compared with other large model players, Alibaba has taken the lead in achieving a closed loop of investment to return with its mature commercialization path and extensive customer base.

As of the end of January 2025, more than 290,000 enterprises have called the Tongyi large model API through the Alibaba Cloud Bailian platform.

Byte: Giant aircraft carrier, back to entrepreneurship

Byte’s large model is characterized by “strong comprehensive capabilities” and covers multi-modal fields such as text generation, image understanding, video generation, and speech processing.

In this pinnacle showdown of technology and resources, Byte not only showed “ferocious combat effectiveness” in the field of self-developed large models and AI applications.

At present, Byte has more than 20 AI applications, and the popular core product “Doubao” has quickly occupied the minds of users with its powerful text generation and multi-modal capabilities, with more than 100 million monthly active users.

The video generation tool “Dream” has also been given a higher strategic priority and has been commercialized in the fields of virtual idols and e-commerce live broadcasts.

Similarly, Byte does not dare to lag behind in the field of AI programming. Their AI programming tool Trae directly benchmarks AI integrated development environments such as Cursor.

In terms of enterprise services, based on the Doubao model, the volcano engine “Feilian” is also implementing AI applications in multiple scenarios.

Byte’s comprehensive layout is also reflected in its ecological integration capabilities. Through platforms such as Douyin, Toutiao, and Feishu, Byte embeds large models into content recommendation and collaborative office, forming an ecological closed loop from technology to application.

Now, with the attitude of “returning to entrepreneurship as a giant aircraft carrier”, Byte has become one of the leaders in China’s AI track with strong funds, ultra-high talent density, and multi-directional comprehensive layout.

Step Star: A low-key large model national team

Compared with other companies, Step Star can be regarded as the most low-key large model national team among the top five.

It can be said that Jie Yue Xingchen is a national team born in Shanghai and raised in Shanghai. By the end of 2024, Jie Leap Xingchen completed a total of hundreds of millions of dollars in financing, with core investors including Shanghai State-owned Capital Investment Co., Ltd. and its funds, and strategic and financial investors including Tencent Investment, Wuyuan Capital, Qiming Venture Capital, etc.

Today, Step Star, which has only been established for 2 years, has released a total of 22 self-developed base models, covering text, speech, images, videos, music, and reasoning, of which 16 are multimodal models, leading the industry in performance and becoming the recognized “multimodal volume king” in the industry.

Among them, Step-1o Vision won the first place in China’s large model list in the visual field and the first place in the multimodal model list in the well-known large model arena Chatbot Arena and the domestic authoritative evaluation platform “Sinan” (OpenCompass) in early 2025.

What’s even more rare is that the multimodal matrix of Step Star is both comprehensive and leading in the industry. You must know that the core of multimodal models is comprehensive capabilities, which require not only the ability of speech, picture, and video, but also require the model to understand, generate, and reason. In this layout, each line of the Leaping Star has reached the first echelon.

Another major difficulty in multi-mode is that the performance of a single model cannot be lost during the fusion process, especially not intelligence. The Leaping Star adopts a native multimodal approach, which is unique in this regard.

In the view of Jie Xingchen, multimodality is the only way to AGI. As the integration of multimodal interaction and inference becomes more mature, more agents will emerge on smart terminals.

Now, Leaping Star is working to completely solve the fundamental problem in the field of vision – the basic problem of representation and alignment, that is, “predict next frame”.

In the future, according to the multi-mode R&D path of Step Star, AI will be able to model the interaction of the physical world, simulate the entire world, and build a world model. At this point, AGI will be realized.

At the beginning of its establishment, Jiang Daxin’s team drew such an intelligent evolution roadmap, dividing intelligent evolution into three stages: simulating the world, exploring the world, and summarizing the world

At the same time, the core technical personnel of Jie Yue Xingchen have experienced ten years of AI development on the front line, and the talent density of this team is extremely high, with both technical insight and practical experience, which can be called the “dream team” of the basic model.

Among them, founder and CEO Jiang Daxin served as global vice president of Microsoft and vice president and chief scientist of Microsoft Asia Internet Engineering Institute. Dr. Jiang Daxin, who was selected as a 2025 IEEE Fellow, is the only inductee from a Chinese large model startup.

Chief scientist Zhang Xiangyu, whose paper “Deep Residual Learning for Image Recognition” (ResNet) is the most cited paper in the world since the 21st century, with more than 250,000 citations.

In terms of commercialization, a number of leading companies and a large number of AI application developers have recognized the multimodal model of Jieyue and have accessed it. At the same time, Jieyue also regards the intelligent terminal Agent as the core breakthrough point for the implementation of large models, and has reached in-depth cooperation with Geely Automobile, Qianli Technology, OPPO, Zhiyuan Robot, Force Lingji, TCL, etc.

Zhipu: Full-stack innovation, make efforts for agents

As the first large-scale model startup company in China to open an IPO, Zhipu stands out with its unique “academic” temperament backed by Tsinghua’s technological heritage, and carries out a comprehensive layout in pedestal models, multi-modal technologies and agents.

At present, Zhipu has established a new generation of cognitive large model technology system, developed a full-stack independent GLM model, and its performance indicators are aligned with the world’s top LLMs.

In August last year, GLM-4-Plus came out and performed well on multiple tasks, on par with the GPT-4 series.

In April this year, Zhipu once again open-sourced the 32B/9B series GLM models, including base, inference, and contemplation models. The performance of mainstream models with 32 billion parameters is comparable to that of 100 billion parameters.

Among them, the GLM-Z1-Rumination contemplation model is the latest exploration of the next generation of AGI technology.

In terms of agents, Zhipu proposed the concept of Phone Use and launched Agent products before OpenAI, and released the world’s first L3 level agent integrating in-depth research and practical operation – AutoGLM Contemplation.

Today, they are using AutoGLM and GLM-PC to carry out in-depth cooperation with global car companies, PC and mobile phone manufacturers to promote large models from Chat to Act.

The commercialization path of Zhipu is centered on 2G and 2B services, deeply bound to the needs of the government and enterprises.

It has built service models such as MaaS privatization deployment and agent platform, forming a new ecosystem of model services including millions of developers.

According to statistics, the MaaS platform supports more than 800,000 enterprises and application developers.

This academic entrepreneurial model gives Zhipu a leading edge in terms of technical depth and strategic stability.

DeepSeek: Research-oriented, thick and thin

DeepSeek is currently the one that has attracted the most attention from foreign countries among the top five. In the Sino-US AI competition, it is also the one that has been mentioned the most and has the strongest presence.

It can be said that this is a maverick technical wonder who directly overturned the table of large models with his own strength.

The characteristic of DeepSeek technology is to focus on language models, especially mathematical capabilities, and take a firm open source route.

During this year’s Chinese New Year, DeepSeek-R1 shocked the world with billions of points, achieving performance comparable to top AI models such as GPT-4 with extremely low computing resources.

Compared with the hundreds of millions of dollars and tens of thousands of high-end GPUs invested by OpenAI and Anthropic when training models, the core secret of DeepSeek lies in the following ultimate engineering optimization.

For example, the MoE architecture allows the total number of parameters of the model to reach 671B, but only needs to activate 37B during runtime, which greatly reduces the computing requirements. Multi-token prediction (MTP) improves the training efficiency of AI and avoids verbatim predictions. Long latent attention (MLA) allows the model to allocate computing resources more accurately.

In short, the key to DeepSeek’s success lies in its research-oriented orientation rather than short-term profitability. Encourage engineers to improve efficiency from a research perspective without facing financial monetization pressure.

The DeepSeek team also brings together a number of top talents. The founder, Liang Wenfeng, also has his own unique principles in employing people.

For example, mainly fresh graduates and graduates who have graduated for one or two years, do not pursue scale, but build a small and refined team.

With the large-scale breaking of the circle, cloud vendors and industry partners have actively accessed, so that the vitality of the model continues to be strong.

Under the boom of DeepSeek, hundreds of millions of ordinary users have begun to embrace AI.

The next stage of the decisive battle is “Intelligence Cap” and “Multimodal Capability”

When the pattern of the “Top 5 Basic Models” is initially formed, the focus of competition has also shifted to more core and cutting-edge technology fields.

Where is the technical decisive point?

Obviously, the pursuit of higher “intelligence ceiling” and breakthrough “multi-modal capabilities” have become the two major technological highlands that must be attacked on the road to AGI.

On the one hand, the pursuit of the upper limit of intelligence is still the most important thing in the field of large models.

Although the current leading models perform amazingly in many tasks, there is still a lot of room for improvement in logical reasoning, common sense understanding, and long text processing.

The ultimate goal of raising the upper limit of intelligence is to move towards AGI that can perform any human intellectual task, which requires the model to have deeper understanding, learning, reasoning, and creative capabilities.

On the other hand, the unity of multimodal understanding and generation is the only way to AGI.

Humans interact with the world through multiple senses and obtain information. For AI to truly understand and integrate into a complex world, it must be empowered to process and blend multiple modal information such as text, images, audio, and video.

When understanding and generation are unified, it can not only promote the popularization of agents in intelligent terminals, but also interact with the physical world through embodied intelligence, collect environmental data, and build world models.

On the basis of the world model, coupled with the planning of complex tasks, the ability to summarize abstract concepts, the ability to strengthen learning algorithms, and the ability to super align, it is possible to achieve AGI.

All in all, the endless exploration of higher intelligence ceilings and the deep integration of multimodal capabilities in the future will be the key to determining the outcome of this competition.

Now standing in front of the threshold of AGI, the competition of the “Top 5 Basic Models” is not only a contest of technology, but also a comprehensive game of resources, talents and ecology.

The five leaders, Byte, Alibaba, Step Star, Zhipu, and DeepSeek, are leading China’s AI to continue to approach the forefront of the world with their unique advantages and strategic vision.

And the end of this competition may be the dawn of AGI.

End of text
 0