OpenAI Agent finally debuts: agents in the virtual sandbox cannot roll up the “real battlefield” of Chinese companies

While the global technology giants shouted “agent revolution”, OpenAI’s Agent chose to quietly appear in the virtual sandbox – more than enough to show off skills, not enough landing. This article will go through the technological halo, deeply dismantle the product path, landing method and threshold of OpenAI Agent, and compare the real needs of Chinese enterprises on the “real battlefield”, revealing the role misalignment and path divergence in an agent race.

In July 2025, the spotlight on the AI industry will once again focus on OpenAI.

When Sam Altman announced the official launch of “ChatGPT Agent” during the live broadcast, the reaction of the global technology circle was somewhat subtle – with a hint of relief that “it is finally here” in anticipation, mixed with a re-examination of the “Agent” track pattern.

The core of this press conference, which OpenAI defines as “the leap from Chat to Agent”, is to enable ChatGPT to have the closed-loop ability of “independent thinking-action-feedback”: users only need a command, and it can call text browsers, visual browsers, and terminal tools in the virtual sandbox to complete multi-step tasks from information retrieval, PPT production to online shopping.

But when we shift our focus from OpenAI’s virtual sandbox to the real office scenario of Chinese companies, a more thought-provoking question emerges:When global technology giants are still building agents in “virtual environments”, Chinese teams have already run through the landing path of enterprise-level agents on “real computers”.

01 OpenAI’s “Agent Answer Sheet”: The “Intelligent Three Musketeers” in the Virtual Sandbox

To understand the technical logic of ChatGPT Agent, we must first trace back to OpenAI’s technology accumulation over the past two years.

The Agent function released this time,In essence, it is a fusion of “Operator”, “Deep Research” and “ChatGPT language capabilities”, called the “AI Three Musketeers” by OpenAI.

1. Virtual Sandbox: Isolated “Digital Twins”

In the ChatGPT interface, users will see a separate window, which is the agent’s exclusive “virtual computer” – with its own operating system and browser with access to the internet, but completely isolated from the user’s real device.

B-end product manager’s ability model and learning improvement
The first challenge faced by B-end product managers is how to correctly analyze and diagnose business problems. This is also the most difficult part, product design knowledge is basically not helpful for this part of the work, if you want to do a good job in business analysis and diagnosis, you must have a solid …

View details >

This design is designed with the intention of security: the agent does not directly operate the user’s computer, and all clicks, inputs, and code execution are done in the sandbox.

For example, if a user requests to “find a review of a mobile phone and generate a comparison table”, the agent will first call the text browser to capture the evaluation data of multiple platforms, then simulate clicking pagination through the visual browser, and finally run a Python script on the terminal to clean the data and generate an Excel file.

2. Tool matrix: the leap from “mouth” to “hands”

Supporting this process are three tool modules carefully designed by OpenAI:

  1. Text Browser:For web pages with high information density (such as papers and technical documents), the efficiency of extracting key data through natural language processing is 5-8 times that of manual reading.
  2. Visual Browser:It simulates human operations and completes graphical interface interactions such as clicking buttons, filling out forms, and taking screenshots, solving the pain point that traditional crawlers cannot handle dynamically loading pages.
  3. Terminal Tools:Connect to cloud service APIs (such as AWS, Google Cloud), call image generation models (such as DALL· E3), run code scripts to automate the whole link from data processing to content generation.

3. Ability Boundaries: The double-edged sword of “virtual”

According to OpenAI’s demonstration, it takes an average of 10 minutes for Agent to complete a complex task (such as travel planning + hotel booking + itinerary PPT production), with a completion rate of more than 90%.

In the HLE (Human-Level Efficiency) benchmark, it scored 41.6%, nearly twice that of GPT-3.5 and GPT-4 Mini.

But the other side of the coin is,All operations are restricted to the virtual environment:It cannot open the user’s local Excel file, operate the internal OA system, or call third-party software (such as some customized ERP tools) that do not open APIs.

This “virtual isolation” design is not only OpenAI’s security moat, but also the biggest bottleneck for its enterprise-level landing.

As the overseas technology media “The Verge” commented: ChatGPT Agent is an excellent “digital assistant”, but it is still a “real-world interface” away from becoming a “digital employee” of enterprises.

02 The Agent track can’t roll up: the three fatal wounds of the virtual environment

When OpenAI was working intensively in the virtual sandbox, the global agent track was already surging.

From Manus, which became popular at the beginning of the year, to Minimax’s “Agent Matrix” in China, to Kimi’s “Multimodal Actor”, players seem to be racing on the same track, but in fact they have already divided into two technical paths: “virtual school” and “real school”.

The three major pain points of the virtual faction are making this track lose the meaning of “volume”.

1. API Dependency: The “Glass Cage” of Enterprise Data

The core logic of the virtual agent is to “call external APIs to complete tasks”.

For example, to generate PPT, you need to call the API of Google Slides or Canva; To send an email, you must connect to the interface of Gmail or Outlook. This leads to two problems:

  1. Data fragmentation:Enterprise core data (such as local CRM systems and unnetworked production databases) cannot be directly accessed by agents and need to be manually exported or connected through API, increasing the risk of data leakage.
  2. Limited functionality:Software that does not open APIs (such as some old financial systems and customized production management tools) is completely inoperable, and such systems account for more than 60% of traditional enterprises (according to Gartner’s 2025 Enterprise IT Survey).

2. Sandbox Wall: A “dimensionality reduction strike” in real scenarios

The isolated design of the virtual environment makes the agent a “spectator of the digital world”.

Taking corporate finance scenarios as an example, accountants need to log in to the bank’s online banking every day to download statements, import local financial software, and generate vouchers – this series of operations involves real interactions such as “cross-system switching, verification code input, and pop-up confirmation”.

Because virtual agents cannot operate real computers, they can only be completed through the “API directly connected to the banking system”, but this requires banks to open interfaces, and 90% of small and medium-sized banks in China do not provide such services.

3. Cost ceiling: the “computing power gap” for enterprise-level demand

OpenAI’s pricing strategy exposes the cost pressure of virtual agents: 400 calls per month for Pro users and only 40 for Plus and Team users.

Behind this is the high computing power consumption of virtual sandboxes – each agent task needs to independently allocate virtual machine resources to run browsers, terminals and other tools, and the cost of a single task is 10-20 times that of ordinary conversations.

For enterprises that require high-frequency automation (such as e-commerce customer service, supply chain management), such costs are almost unacceptable.

03 The “real breakthrough” of the Chinese team: “real computer operation” that does not rely on API

When virtual agents are spinning in the “sandbox dilemma”, Chinese AI teams have already found another way.

Represented by the “Real Agent” launched by Real Intelligence, its technical path directly points to the core pain points of enterprises:No API docking is required, and it directly simulates the manual operation of a real computer to complete the full-scenario automation from local software to web system.

1. Technical Background: From “API Calls” to “Anthropomorphic Operations”

The core breakthrough of Real Agent is the deep integration of “computer vision + automation control”.

Traditional RPA (Robotic Process Automation) tools simulate clicks through “code scripts” but are prone to failure when encountering dynamic pages (such as verification codes, pop-ups) or complex operations (such as drag-and-drop, multi-window switching).

Real Agent uses OCR (Optical Character Recognition), NLP (Natural Language Processing) and ISSUT intelligent screen semantic understanding technology to “understand” screen content and “understand” the operation logic, just like a human:

  • Cross-System Operation:From the web page (such as Taobao background) to local software (such as Kingdee ERP), automatically switch windows and enter account passwords;
  • Exception Handling:Recognize the verification code (text/slider/click), pop-up prompt (such as confirm submission), and select the operation according to the context;
  • Data extraction:Accurately extract structured data from PDFs, images, and tables, supporting complex formats (such as merging cells, slash headers).

2. Scenario landing: the “all-rounder” of enterprise-level tasks

In the pilot of a leading manufacturing company, Real Agent has taken over the entire process of “procurement-warehousing-reconciliation”:

  1. Procurement link:Automatically log in to the supplier platform, generate purchase orders according to the production plan, and verify prices and inventory;
  2. Warehousing link:Synchronize WMS system (warehouse management), identify logistics tracking numbers, and enter warehousing information in ERP;
  3. Reconciliation link:Download bank statements to match with purchase orders, flag abnormal transactions, and generate reconciliation reports.

The entire process requires no manual intervention, and the processing efficiency is 8 times that of manual labor, and the error rate is reduced from 3% to 0.1%.

More importantly, it can operate old systems that do not open APIs (such as customized production management software running on Windows 7), which is a “black box scenario” that virtual agents cannot reach at all.

3. Cost advantage: from “pay-per-use” to “on-demand deployment”

Unlike OpenAI’s “pay for calls”, Real Agent adopts a “localized deployment + subscription system” model.

Enterprises only need to install the Agent management platform on the local server to create multiple “digital employees”, each “employee” can handle 5-10 tasks at the same time, and the annual cost is only 1/3 of the same size of a manual team.

For customer service, finance and other positions that need to operate 24 hours a ×, the cost-effective advantage of this model is particularly prominent.

04 The “New Paradigm” of the Agent Track: From “Virtual Assistant” to “Real Employee”

OpenAI’s ChatGPT Agent marks the transition from a concept to a product for a “general-purpose agent”, but it is more like a “technical rehearsal” – telling the world that “agents can do this”.

The exploration of Chinese teams such as Real Intelligence answers more critical questions:“How should agents create value for enterprises”.

Behind this differentiation is a fundamental change in the logic of AI landing:

  • From “ability display” to “scene adaptation”:Virtual agents pursue “what can be done”, while real agents focus on “what problems to solve”;
  • From “cloud dependency” to “on-premises intelligence”:Virtual agents rely on cloud computing power and APIs, and real agents ensure data security through localized deployment.
  • From “individual tools” to “organizational collaboration”:Virtual agents are “personal assistants”, and physical agents are “enterprise digital employees” who can seamlessly collaborate with human teams (such as receiving instructions from supervisors and sharing documents with colleagues).

05 The end of the agent, in the real world

When Sam Altman said, “Seeing ChatGPT think, plan, and execute is the moment to feel AGI”, we must admit that OpenAI is still leading in the “thinking layer” of “agents”.

However, the ultimate value of AI is never in the “virtual sandbox” of the laboratory, but in the production line of the factory, the office of the enterprise, and the clinic of the hospital – these scenarios that require “real operation” are the “final battlefield” of the agent.

The “real breakthrough” of the Chinese team is essentially a redefinition of the AI landing logic:The core of the agent is not “how smart” it is, but “how useful it is”; It’s not “how many APIs can be called”, but “how many problems can be solved”.

When OpenAI was still “rolling” technology in a virtual environment, Chinese companies had already planted the seeds of large-scale commercial use of agents in the soil of the real world with “real agents”.

This may be the most noteworthy “generation difference” in the AI industry in 2025: it is not the leadership of technology, but the first-mover advantage of “demand understanding” and “scenario landing”.

End of text
 0