Where will AI models go in enterprise scenarios in 2025?

Enterprise deployment of AI is no longer a pilot project but a strategic move. The budget has been normalized, the model selection has been diversified, the procurement process has been standardized, and the AI application has begun to be systematically implemented. Although industrial demand and enterprise demand are fragmented, this is exactly the direction that enterprises embrace. Some key manufacturers are standing out, and companies are increasingly choosing finished product applications to accelerate implementation.
The market shape is getting closer to traditional software, but the pace of change and complexity is completely different—this is the unique rhythm of AI.

In 2025, where will AI large models go in enterprise scenarios?

The past year has seen a fundamental shift in AI’s position in enterprises. It is no longer an isolated experiment in the innovation laboratory, nor is it not only a “new toy” that the technical department is keen on, but has truly entered the core business system and has become an integral part of the IT and operating budget.

This is a quiet but rapid evolution: AI models have become more diverse, procurement processes have become more rigorous, and companies have begun to select, deploy, and evaluate AI services in an orderly manner like traditional software purchases. Technology leaders are becoming more mature – they understand that different models are adapted to different tasks, fragmented use cases are the norm, and high-quality AI-native applications are rapidly surpassing traditional software vendors.

Recently, A16z released a research report with the theme of “AI Technology in Enterprise Scenarios”, which is based on in-depth interviews with more than 20 corporate buyers and 100 CIOs, and comprehensively reviews how enterprises deploy, procure, integrate and plan AI in 2025.

Behind this report corresponds to a new point of view, that is, AI is no longer a question of “whether it is worth trying”, but a practical challenge of “how to implement it on a large scale”.

How is AI implemented? Or, how should AI be implemented in enterprise scenarios? How to land better? It can be said that it is a survey and a mirror for global enterprises to implement AI.

B-end product manager’s ability model and learning improvement
The first challenge faced by B-end product managers is how to correctly analyze and diagnose business problems. This is also the most difficult part, product design knowledge is basically not helpful for this part of the work, if you want to do a good job in business analysis and diagnosis, you must have a solid …

View details >

Let’s dive into this report.

The following is the original text of the report (some sentences have been adjusted for readability):

A year ago, we summarized 16 changes that businesses are facing in building and procuring generative AI (Gen AI). Today, the situation has changed drastically. To do this, we revisited more than 20 enterprise buyers and surveyed 100 CIOs across 15 industries to help entrepreneurs understand how their enterprise customers will use, source, and plan for AI in 2025 and beyond.

Despite the rapid changes in the AI world, the evolution of the market landscape over the past year has exceeded our expectations:

1. Enterprise AI budgets continue to exceed the standard, jumping from pilot projects to part of IT and business core budgets.

2. The application of “multi-model combination” is becoming more and more mature, and enterprises have begun to pay attention to the balance between performance and cost. OpenAI, Google, and Anthropic are the mainstays of the closed-source market, while Meta and Mistral are popular choices in the open-source camp.

3. The AI model procurement process is getting closer to traditional software procurement: stricter evaluation, more exquisite custody, and more attention to standardized testing. At the same time, more complex AI workflows are driving up model replacement costs.

4. The AI application ecosystem is gradually taking shape: standardized applications have begun to replace customized development, and AI-native third-party applications have ushered in explosive growth.

This report will focus on the latest trends in the four dimensions of budget allocation, model selection, procurement process and application use, to help entrepreneurs understand the real focus of enterprise customers in more detail.

1. Budget: AI spending exceeds expectations and continues to grow

1. AI budget growth has far exceeded expectations and shows no signs of slowing down

Enterprises have significantly exceeded last year’s already high budget expectations and are expected to continue to grow in the coming year, with an average increase of about 75%. As one CIO put it, “My spending for the entire year of 2023 is now used up in a week.” ”

There are two reasons for the increase in budgets: on the one hand, companies continue to explore more internal use cases to promote widespread adoption by employees; On the other hand, more and more enterprises are deploying customer-facing AI applications, especially technology innovation enterprises, and the investment in these scenarios is expanding exponentially. A large technology company said: “Last year we mainly focused on internal efficiency improvements, and this year’s focus will shift to customer-facing Gen AI, which will greatly increase investment. ”

2. AI is officially included in the core budget, ending the “trial period”

A year ago, about 25% of enterprise spending on LLMs still came from the innovation budget. Today, that percentage drops to 7%. Enterprises generally include the cost of AI models and applications in their regular IT and business budgets, reflecting that AI is no longer an exploratory project, but the “infrastructure” for business operations.

“Our products are integrating AI capabilities, and spending is naturally rising,” one CTO noted. This means that the trend of AI into mainstream budgets will accelerate further.

2. Model: The multi-model strategy has become the mainstream, and the three major manufacturers have initially established a leading position

3. The multi-model era has become the norm, and “differentiation” rather than “homogeneity” has become the driving force

There are already several high-performance LLMs on the market, and enterprises are starting to deploy multiple models in actual production. While avoiding vendor binding is an important reason, the more fundamental driver is that different models are increasingly performing differently in different use cases.

In this year’s survey, 37% of companies are using five or more models, a significant increase from 29% last year.

While the model scored similarly in some common evaluations, enterprise users found that the difference in actual performance was not negligible. For example, Anthropic’s Claude is better at fine-grained code completion, while Gemini is better suited for system design and architecture. In text-based tasks, users report that Anthropic’s language fluency and content generation are stronger, while OpenAI’s model is better suited for complex question-answering tasks.

This difference has prompted enterprises to adopt “multi-model best practices” to ensure performance optimization while reducing dependence on a single vendor. We predict that this strategy will continue to dominate the model deployment path of enterprises in the future.

4. The model pattern is still fierce, but the three major manufacturers are beginning to show their advantages

While companies continue to experiment and test multiple models in production, three leaders have emerged in the market: OpenAI maintains market share leadership, and Google and Anthropic have quickly caught up over the past year.

Specifically:

(1) OpenAI: Its model portfolio is widely used, GPT-4o is the most commonly deployed model in production environments, and the inference model o3 has also attracted great attention. 67% of OpenAI users have deployed non-cutting-edge models in production, which is much higher than Google (41%) and Anthropic (27%).

(2) Google: Performed more prominently among large enterprises, thanks to GCP’s customer base and brand trust. Gemini 2.5 not only has a top-level context window, but also has a clear advantage in terms of cost performance – Gemini 2.5 Flash costs $0.26 per million tokens, which is much lower than GPT-4.1 mini’s $0.70.

(3) Anthropic: Highly favored by cutting-edge technology companies (such as software companies and start-ups). It stands out for code-related tasks and is the core engine behind the fastest-growing AI coding applications.

Additionally, open-source models like Llama and Mistral are preferred by large enterprises primarily for data security, compliance, and customizability. The Grok model of new player xAI has also begun to gain widespread attention, and the market is still full of uncertainties.

5. For small and medium-sized models, the cost-effective advantages of closed-source models are becoming more and more obvious

As mentioned earlier, model costs are decreasing at a rate of an order of magnitude per year. In this trend, the performance/cost ratio of closed-source models, especially small and medium-sized models, is becoming more and more attractive.

The current leaders in this area are xAI’s Grok 3 mini and Google’s Gemini 2.5 Flash. For example, some customers say they prefer a closed-source model due to cost considerations and ease of ecosystem integration.

As one customer admitted: “The pricing is already very attractive, and we are deeply embedded in the Google ecosystem, from G Suite to databases, and their enterprise service experience is valuable to us.” Another customer summed it up more bluntly: “Gemini is cheap. ”

This reflects that the closed-source model is gradually winning the market in low- and medium-cost scenarios.

6. The importance of fine-tuning is declining as model capabilities increase

With significant improvements in model intelligence and context windows, enterprises are finding that achieving excellent performance no longer relies on fine-tuning and more on efficient prompt engineering.

One company observed: “We no longer need to extract training data to fine-tune the model, as long as we put it in a long enough context window, the results are almost as good.” ”

This shift has two important implications:

(1) Reduce the cost of use: the cost of prompt engineering is much lower than that of fine-tuning;

(2) Reduce the risk of supplier binding: Prompt can be easily migrated to other models, while fine-tuned models often have migration difficulties and high upfront investment.

However, fine-tuning is still essential in some hyper-specific use cases. For example, a streaming company fine-tuned its open-source model to fit domain language for query enhancement in video search.

In addition, if new methods such as Reinforcement Fine-tuning are widely used outside the laboratory, fine-tuning may usher in a new round of growth in the future.

Overall, most organizations have reduced ROI expectations for fine-tuning in conventional scenarios and prefer open source models in cost-sensitive scenarios.

7. Enterprises are optimistic about the prospects of the “inference model” and are actively preparing for large-scale deployment

Reasoning models enable large language models to complete more complex tasks more accurately, significantly expanding the available scenarios for LLMs. Although most enterprises are still in the testing stage and have not yet been officially deployed, there is generally optimism about its potential.

“Inference models can help us solve more new, complex task scenarios, and I expect to see a significant increase in its use soon,” said one executive. It’s just that we’re still in the early stages of testing. ”

Among early users, OpenAI’s inference model is the most prominent. Although DeepSeek has a lot of attention in the industry, OpenAI’s advantages are very obvious when it comes to production deployments: the survey shows that 23% of enterprises have used OpenAI’s o3 model in production, compared to only 3% of those who use DeepSeek. However, DeepSeek has a relatively higher adoption rate among startups, and enterprise market penetration remains low.

As reasoning capabilities are gradually integrated into the main process of enterprise applications, their influence is expected to expand rapidly.

3. Procurement: The enterprise AI procurement process is becoming mature and is comprehensively learning from traditional software procurement mechanisms

8. The model procurement process is becoming more and more standardized, and the cost sensitivity is increasing

At present, enterprises have generally adopted a systematic evaluation framework when selecting models. In our interviews, safety and cost, along with accuracy and reliability, became the core considerations of model procurement. As one business leader said: “Now most models have enough basic capabilities, and price has become a more important factor. ”

In addition, enterprises are becoming increasingly specialized in “use case-model” matching:

(1) For key scenarios or tasks with high performance requirements, enterprises are more inclined to choose top-level models with strong brand endorsements;

(2) For internal or low-risk tasks, enterprises are more “cost-oriented” in decision-making.

9. Enterprises’ trust in model manufacturers has been significantly improved, and hosting strategies have become more diversified

In the past year, trust between enterprises and model manufacturers has increased significantly. While some organizations still prefer to host models through existing cloud service relationships, such as using OpenAI through AWS, more and more organizations are choosing to work directly with model providers or host them through platforms like Databricks, especially if the models are not hosted by the primary cloud vendor.

As one interviewee said: “We want to use the latest and most powerful model as soon as possible, and the preview version is also critical. Compared with last year’s strategy of “bypassing the main cloud vendor as much as possible”, this direct hosting trend is a significant change.

10. As task complexity increases, the cost of model switching is also rising rapidly

Last year, many companies deliberately reduced switching costs when designing AI applications, hoping that the model would “come and go freely”. But with the rise of “agent-based workflows,” this strategy began to fail.

Agent workflows often involve multi-step collaboration, and replacements between models can be a stake. After investing a lot of resources in building prompts, designing guardrails, and verifying quality, enterprises are more reluctant to easily replace models.

One CIO summed it up quite straightforwardly: “All of our prompts are optimized for OpenAI, and each prompt has a specific structure and details. Switching to another model not only re-tunes all prompts, but can also affect the stability of the entire workflow. ”

11. External evaluation benchmarks are increasingly becoming the “first screening of model procurement”

As the number of models proliferates, enterprise buyers are increasingly relying on external evaluation systems like the Gartner Magic Quadrant, such as LM Arena. Such evaluations provide a preliminary screening reference for model procurement.

While companies still place a high value on internal benchmarks, gold label datasets, and developer feedback, external metrics are becoming the “first threshold.” However, it is generally emphasized that external benchmarks are only part of the evaluation, and the real decisive factors still come from actual trials and employee feedback.

4. Application: AI applications are accelerating, and enterprises are shifting from “self-construction” to “procurement”

12. Enterprises have changed significantly from “self-development” to “purchasing finished products”

The AI application ecosystem is maturing rapidly. In the past year, the transformation of enterprises from “self-construction” to “procurement of professional third-party applications” is very obvious.

There are two main reasons:

(1) dynamic differences in performance versus cost necessitate continuous evaluation and tuning, which are often better performed by professional teams rather than in-house teams;

(2) The AI field is evolving rapidly, and internal self-developed tools are difficult to maintain for a long time, and may not constitute a competitive advantage, but reduce the cost performance of “self-built”.

For example, in a customer support scenario, more than 90% of CIOs surveyed said they were testing third-party apps. A listed fintech company tried to develop its own customer service system, but finally decided to move to a mature procurement solution. This trend has not yet fully unfolded in high-risk industries such as healthcare, where data privacy and compliance remain top concerns.

13. “Pay for results” is still not widely accepted by CIOs

Despite the widespread discussion of “pay-for-performance”, there are still many concerns in practice – such as vague definitions of results, difficult attribution, uncontrollable costs, etc. Most CIOs say they prefer pay-as-you-go because it’s more intuitive, predictable, and controllable.

14. Software development became the first “killer” AI application scenario

While AI has been implemented in many fields such as internal search, data analysis, and customer service, the application explosion of software development is the most significant. This is due to the triple benefit:

(1) The model capability has been significantly improved;

(2) The quality of ready-made tools is extremely high;

(3) The return on investment is directly visible and applicable to a wide range of industries.

The CTO of a high-growth SaaS company says that nearly 90% of their code is now generated by Cursor and Claude Code — compared to only 10-15% a year ago when using GitHub Copilot. This leapfrog adoption is still at the forefront, but it may be the bellwether of the future of the corporate world.

15. The Prosumer market (producer-consumer convergence) drives early application growth

The phenomenon of strong consumer brands driving corporate purchasing decisions has been staged again.

ChatGPT is a prime example: Many CIOs say they buy enterprise ChatGPT because “employees are used to it, they like it, and they can trust it.” The natural extension from the producer market to the enterprise side accelerates the growth of next-generation AI applications.

16. The speed and quality of AI-native applications are surpassing traditional giants

Although traditional manufacturers have channel advantages and brand trust, AI-native companies have begun to surpass in terms of product quality and iteration speed. For example, in the field of coding tools, tools like Cursor, built for AI scenarios, make users clearly “no longer satisfied” with traditional GitHub Copilot.

A CIO in the public safety industry pointed out: “The first and second generation AI coding tools are very different. The new generation of native products is smarter and more practical. ”

Looking ahead: The “experimental era” of enterprise-grade AI is over

Enterprise deployment of AI is no longer a pilot project but a strategic move. The budget has been normalized, the model selection has been diversified, the procurement process has been standardized, and the AI application has begun to be systematically implemented. Despite the fragmentation of use cases, this is exactly the direction that businesses are embracing. Some key manufacturers are standing out, and companies are increasingly choosing finished product applications to accelerate implementation.

The market shape is getting closer to traditional software, but the pace of change and complexity is completely different—this is the unique rhythm of AI.

End of text
 0