As industry leaders, Microsoft and Google demonstrated the focus of their AI strategies at recent developer conferences. Microsoft focuses on building an open agent network, attracting enterprises and developers to explore the future of AI by providing a powerful development environment, improving agent development efficiency, building agent carriers, and establishing network connectivity capabilities. Google, on the other hand, has demonstrated its innovation in large model capabilities, search AI, hardware applications, and programming tool upgrades, and is committed to integrating AI into more C-end products.
Microsoft’s Build 2025 conference and Google’s I/O developer conference both chose to be held this week, and both events talked about AI at the core.
The difference is that Microsoft’s focus is to show the industry how to build agents better. At the Build 2025 conference, Microsoft presented a more mature Agent infrastructure to users, hoping to attract more developers to join the process of building an Open Agentic Web – a system in which AI agents can work together in individuals, organizations, teams, and even the entire end-to-end business process.
Google is committed to showing a prototype of an AI operating system built around Gemini. Google CEO Sundar Pichai used the term “Gemini Era” to describe the future in his speech. On the one hand, Google has demonstrated stronger model development capabilities; On the other hand, Google is integrating Gemini’s capabilities into various C-end products.
Although Microsoft and Google have different focuses, their AI-oriented strategic planning has a certain integrity, no longer trying to scatter points, but beginning to find a line that connects the scattered points into a system. The mission of this system is to make research really useful and translate it into practical applications as quickly as possible.
This is a change that we have not yet observed in the domestic manufacturers. We can observe that Alibaba, Tencent, and Byte are all actively deploying in the basic layer of models, business AI, and product innovation, but it is difficult to extract a banner for domestic manufacturers to guide the entire enterprise forward, as Microsoft and Google summed up this time. The same goes for Apple.
This may be due to caution or indecision. But no matter what the considerations are, the appearance of the flag will be one of the signs that the company’s AI strategy is advancing to the next stage.
Microsoft: It’s all about opening up the proxy network
A start-up stage that shows magical results, but is not mature enough – this is Microsoft’s judgment on the current stage of AI technology development at the Build 2025 conference. Microsoft CEO Satya Nadella chose to use Win32 in 1991, the web stack in 1996, and smartphones in 2008 to compare the current stage of AI development.
After 10 years of interaction design, why did I transfer to product manager?
After the real job transfer, I found that many jobs were still beyond my imagination. The work of a product manager is indeed more complicated. Theoretically, the work of a product manager includes all aspects of the product, from market research, user research, data analysis…
View details >
The above nodes are not a node for the explosion of C-end users, but the beginning of the future layout of B-end users.
Therefore, Microsoft has focused on attracting enterprises and developers who are more B-side, and has prepared an operation room with rich tools and equipment for them.
We can divide the various dizzying tools and abilities into the following categories:
The first category provides basic capabilities for development environments, including Windows AI Foundry and Azure AI Foundry. The former is Microsoft’s development environment for local AI development, simplifying the process of running AI models, tools, and agents directly on the device through Foundry Local. The latter is a cloud development platform provided by Microsoft, and a major update this time is the introduction of xAI’s Grok 3 and Grok 3 mini.
At the same time, Microsoft launched Azure AI Foundry Agent Service, which allows professional developers to orchestrate multiple dedicated agents to handle complex tasks. Ray Smith, vice president of AI Agent at Microsoft, believes that trying to fully integrate a complex process that requires high reliability into a single agent often faces many challenges, and systematically breaking down tasks into multiple agents can significantly enhance reliability.
The second category is to improve the development efficiency of agents. According to Microsoft, 15 million developers are already using GitHub Copilot to improve code development efficiency, which accounts for one-tenth of the total number of GitHub users. In the latest update, GitHub Copilot will be able to support user-assigned tasks such as bug fixes, code maintenance, and more, and can be used in VS Code.
Microsoft has also launched Microsoft 365 Copilot Tuning, which allows developers to build more proprietary agents using models fine-tuned for company data, workflows, and style. Developers can fine-tune models in low-code form, which often takes weeks for an entire data science team to complete.
The third category provides the carrier of the agent. Microsoft did not focus too much on the C-end agent carrier, but focused on the upgrade of the collaboration product Teams. Nadella believes that the new Teams truly integrates chat, search, notes, generation, and agents into one intuitive framework. It is also a complete AI user interface that supports multi-person collaboration and carries the circulation of agents.
The fourth category provides network connectivity capabilities. On the one hand, Microsoft began to fully support the MCP (Model Context Protocol) protocol, which made the agent network it wanted to build have open properties and complex task execution capabilities. In the live demo, app developers used GitHub Copilot in VS Code and the MCP protocol for Windows to develop a specific style of web page in 3 sentences.
Under the command of the first sentence, GitHub Copilot connected to the MCP server of WSL (Windows Subsystem for Linux) and completed the installation of the latest version of Fedora. Under the second command, GitHub Copilot creates a website project; Under the third command, GitHub Copilot uses the MCP protocol to extract the corresponding design details from the user’s Figma client and adjust the web page accordingly.
Based on the MCP protocol, Microsoft also proposed the concept of NLWeb this time. Microsoft CTO Kevin Scott believes that the MCP protocol is HTTP in the AI era, while NLWeb is HTML in the AI era, making it easy for anyone with a website or API to turn it into an agent. “Every NLWeb endpoint defaults to an MCP server, which means that what people provide through NLWeb will be accessible by any MCP-enabled agent.”
Microsoft has completed the basic combing of its own business around the open proxy network. After this combing, Microsoft’s product system will serve HTTP and html in the AI era, so as to continue the glory of the Internet era.
Google: Transform and connect everything with Gemini
Unlike Microsoft’s AI products for serving enterprises and developers at the Build 2025 conference, Google showcased innovations that made more C-end users shine at the I/O conference. The upgrade of large model capabilities, the AI attempt of search, and the practical display of Android XR have deepened Google’s judgment that it wants to make C-end users use AI more effectively.
At the model level, Google’s model capabilities are progressing rapidly and have richer capabilities. First of all, the launch of Gemini 2.5 Pro has reversed the market’s perception of Google’s large model business, making it a strong contender for the large model crown. Secondly, the video model Veo 3 and the raw image model Imagen 4 released this time have received relatively positive evaluations in terms of display effects.
On the basis of the model, Google has developed different basic agent capabilities. Among them, Project Astra focuses on low-latency multi-modal capabilities, which can observe and understand the surrounding environment through cameras and microphones, and have long-term memory capabilities. Project Mariner emphasizes multitasking capabilities, can handle 12 tasks at a time, and can browse and use websites on behalf of users to complete purchases.
In the live demonstration of the I/O conference, these basic agent capabilities have been applied to the AI transformation of C-end products such as AI search, Chrome browser, Gemini App, and smart glasses.
In the core search business, Google has finally begun to use AI to increase its weight. Starting May 20, Google will roll out AI mode to all US users. Users can ask Gemini questions of a few hundred words, and they can also experience multimodal Q&A functions and deep research capabilities in the future. Google will also add an AI assistant to the Chrome browser to help users summarize and refine page information.
Currently, Gemini App has over 400 million monthly active users. Gemini Live will support camera and screen sharing, allowing AI to help users understand and remember their surroundings. In addition, Gemini’s Agent mode will also be able to help users find rooms, make reservations, and book tickets. In the live demo, Gemini Live can guide users on how to repair their own bikes and help them call to order the parts they need.
In addition to the software side, the basic agent is also applied to the hardware side. Android XR is regarded by Google as the first Android platform to reach the masses in the Gemini era. In the live demonstration, we saw the hardwareization of Gemini Live’s capabilities. Google also plans to bring Gemini to watches, car dashboards, and even TVs in the coming months.
Similarly, Google has upgraded its AI-powered programming tools. On the one hand, a programming agent called Jules can implement asynchronous development like GitHub Copilot, automatically refactoring code and writing tests. On the other hand, the upgraded Gemini Code Assist supports code review, long file parsing, multi-person collaboration, and integrates personalized suggestions and team code specification tools.
From the foundation model to the software to the hardware and programming tools, Gemini is undoubtedly at the heart of it all. Google’s strategy is simple, which is to build a system prototype for the AI era by continuously productizing Gemini’s model capabilities and integrating them into existing and future scenarios.
Domestic giants have not yet found a real breakthrough
Although the specific directions are different, Microsoft and Google have jointly verified the general trend of “AI towards application”.
However, Microsoft is creating conditions that make it easier for people to explore AI applications; Google wants to build a system to make itself a basic AI application. This is determined by the different resources and endowments of the two companies – Microsoft mainly provides enterprise services, and Google targets the C-end market.
If you take these two companies as a reference to observe the domestic giants, you will find that although Alibaba, Tencent, and Byte have not yet come up with a particularly clear main line, they also have their own focus under the general trend of AI applications.
Alibaba’s advantages in the AI era are reflected in large models and cloud services, and the path to B to provide support to developers and build an open agent ecosystem is relatively smooth. As Alibaba’s C-end advantage in the Internet era, it is difficult for e-commerce business to smoothly transform into a to C application fulcrum in the AI era. Thus, the importance of quarks is elevated. Facing the future, Alibaba may need to inject the development potential of Agent into Quark.
Tencent is closer to Google, has a relatively stable C-end product system and traffic entrance, and will prioritize how to complete the transformation of C-end stock products and future-oriented C-end product innovation, just like the recent transformation of QQ browser. However, compared with Google, Tencent’s model capabilities are weaker, and it can only rely on DeepSeek and Yuanbao dual model drives. Although Tencent has taken actions to enhance the research and development of its own large model capabilities, there is no sign of the emergence of Tencent’s version of Gemini 2.5.
Byte also has C-end traffic, but short videos are also difficult to directly convert into C-end entrances in the AI era. This is why Byte attaches importance to the multimodal capabilities of the Doubao model and is the most active in exploring AI hardware among the three. In terms of To B business, Byte relatively lacks an ecological foundation and needs to find a more effective way to open up. And, like Alibaba, Byte also needs an open agent ecosystem.
So far, these focuses have not spawned the same strategic focus as Microsoft and Google, and the three seemingly comprehensive layouts have not yet found their real breakthroughs.