ChatGPT agent is born! Ultraman led the live broadcast late at night, the first unified agent seamlessly integrates the three major AIs, thinks and makes decisions independently, and can also go online to directly output PPT and Excel. In 2025, as a new AI lever, ChatGPT is unlocking a new model of “super individual”.
Last night, ChatGPT, Deep Research, and Operator “Three Musketeers” were combined for the first time!
Altman personally led the team and released the ChatGPT agent in a 25-minute high-energy live broadcast, opening a new era of collaboration between humans and agents.
At its core, ChatGPT agent is a unified agent system.
In short, it combines the advantages of the previous three major technological breakthroughs: the ability of operators to interact with websites, the skills of Deep Research to integrate information, and the advantages of ChatGPT’s intelligent dialogue.
Today, ChatGPT can work for you autonomously using a computer directly.
From intelligent web browsing, filtering results, reminding you to log in securely, running code, and analyzing when needed, you can also directly output PPT and Excel to summarize the findings.
Most importantly, everything is under control.
Humans can interrupt tasks, take over browsers, or stop them altogether at any time.
In the HLE test, ChatGPT agent scored 41.6% high; And on the mathematical frontier Math benchmark, it also refreshes SOTA, crushing the o4-mini and o3 models.
By the way, ChatGPT Agent is still inferior to Musk’s Grok 4 Heavy on HLE
To achieve these three challenges, product managers will only continue to appreciate
Good product managers are very scarce, and product managers who understand users, business, and data are still in demand when they go out of the Internet. On the contrary, if you only do simple communication, inefficient execution, and shallow thinking, I am afraid that you will not be able to go through the torrent of the next 3-5 years.
View details >
Who would have thought that the above PPT was made by the ChatGPT agent himself. In benchmarks, its ability to operate office software leaves little room for humans.
Netizens commented: The good days of migrant workers are over
Altman lamented that ChatGPT agents use computers to perform complex tasks, which is a moment for him to truly “feel AGI”.
Starting today, Pro, Plus, and Team users can directly start the experience by selecting “Agent mode” in the drop-down bar of the dialog box.
Among them, Pro users have a quota of 400 times per month, and Plus and Team have 40 times per month.
Too long to read the version: (Excerpted from OpenAI researcher Zhang Xikun’s X)
Deep Research is good at doing research, Operator will do the action, and ChatGPT agent can do all these tasks at the same time!
The power of end-to-end reinforcement learning! Based on RL Scaling, the efficiency and data utilization of ChatGPT agent are amazing.
Human-machine collaboration is still the core! The task can be interrupted at any time to guide ChatGPT to complete new tasks. Before making payments or deleting files, we will actively confirm with humans. It will ask questions only when necessary to get clearer instructions.
Real-World Performance > Chase Benchmark Rankings! ChatGPT agent has indeed swept a lot of lists. However, during the model development process, OpenAI neither bothered to brush the scores, nor did it care much about the final position on the rankings.
01 The top three officially debuted the ChatGPT agent for the first time
In January of this year, OpenAI released its first agent Operator, allowing AI to interact directly with the GUI like a human.
Immediately after early February, they launched their first Deep Research, where inference models can directly use tools to conduct research.
Both tools have their own specialties, with Operators being able to surf the web, click, and type autonomously, while Deep Research excels at analyzing and summarizing information.
However, the former cannot open in-depth analysis and write detailed reports; The latter cannot interact with the website to get precise results.
Today, OpenAI officially merged it into one – ChatGPT agent, and a “single model” can unlock new capabilities.
ChatGPT agent comes with a comprehensive suite of tools:
· Visual browser: Used for graphical user interface interaction with web pages
· Text Browser: Used to handle simple reasoning and web queries
· Terminal + Direct API Access: Image API
The agent can also use ChatGPT connectors to connect to Gmail, Github and other applications, making it easy to find relevant information and give replies according to prompts.
Not only that, but log in to any website after taking over the browser and let the ChatGPT agent conduct deeper and broader research and task execution.
As a result, ChatGPT can choose the best path and perform tasks efficiently.
02 The PPT of the migrant workers, ChatGPT did it for them
To demonstrate the capabilities of ChatGPT agent, the team demonstrated a real-life scenario on the spot: planning a wedding for friends Minnie and Sarah.
According to the prompt, this task requires AI to recommend beautiful and reasonably priced dresses, book hotels for attendees, and prepare wedding gifts for the couple based on the dress code and weather conditions.
After understanding the prompt, the ChatGPT agent did not directly report it, but reiterated the confirmation task requirements, such as the exact wedding date.
When everything is confirmed, it will automatically open the browser and display each step-by-step process on the interactive page, that is, the chain of thought.
It should be noted that the agent will start executing tasks in the configured virtual computing environment within a few seconds.
During task execution, the agent uses a text browser query and finds a suitable suit, then switches to the visual browser and waits for confirmation.
While ChatGPT performs wedding planning tasks, it can also be asked to do another task: buy a pair of size 9.5 black shoes.
This means that ChatGPT agents are not afraid of being interrupted. Even if the previous task takes a long time to plan, it does not delay the next thing.
Finally, the ChatGPT agent generated a very comprehensive report, including dresses, hotels, shoes, and gifts, all with planning and suggestions.
In another demo, the team started a task with the ChatGPT app – uploaded a picture of the team’s mascot, the cute puppy, made it into a notebook sticker, and ordered 500 pieces.
Then, it started calling the tool Imagen to generate anime versions of images, design stickers, and order 500 copies from StickerMule to send to xxx.
What’s even more surprising is that ChatGPT agent can also extract evaluation data through connectors, such as Google Drive, and generate PPT by itself.
During this process, the agent writes code and compiles it into a final slide. It also borrows image tools to decorate PPT pages.
After a while, it directly output the first PPT of HLE and FrontierMath, but it is not refined enough, and then it is continuously optimized through RL again.
In the end, you get a beautiful PPT file that can be opened directly in the office software.
I have to say that ChatGPT agent is too strong.
You can even ask ChatGPT agent to create the best itinerary to visit 30+ Major League Baseball ballparks.
It took 25 minutes, and it quickly generated an intuitive visual Excel, which completely freed up your hands when doing data in the future.
03 HLE scored 41.6% and broke records in many benchmarks
The unified agent greatly improves the practicality of ChatGPT in daily and professional fields.
Not only in web browsing, but also in the evaluation of real-world task completion ability, ChatGPT agent refreshes SOTA.
As mentioned above, in the Last Human Exam (HLE), ChatGPT agent set a new pass@1 record with a score of 41.6%.
When the research team adopted the parallel strategy, that is, running up to 8 times at the same time and selecting the result with the highest confidence, the HLE score was directly brushed to 44.4%.
In FrontierMath, the most difficult math benchmark, ChatGPT agent achieved a 27.4% accuracy rate with the help of tools such as code terminals, significantly surpassing o3 and o4-mini.
Not only that, in the internally designed “high economic value knowledge work” test, the output quality of ChatGPT agent is comparable to or even crushes humans in half of the cases.
Tasks come from real-world professional work scenarios, such as writing competitive analyses for on-demand urgent care providers, compiling detailed amortization statements, and finding viable water sources for new green hydrogen facilities
In DSBench, which evaluates data science productivity tasks, ChatGPT agents outperform human performance by significant margins.
In the SpreadsheetBench test of Excel editing capabilities, it also far outperformed existing models.
When given direct editing permissions, the ChatGPT agent significantly outperformed Excel Copilot’s 20.0% with a score of 45.5%.
In the internal evaluation of the modeling task of investment banking analysts in grades 1-3, the underlying model of ChatGPT agent crushed the Deep Research and o3 models.
In addition, in the BrowseComp benchmark released by OpenAI, the ChatGPT agent broke the record with an accuracy rate of 68.9%, which is 17.4% higher than Deep Research.
In the WebArena benchmark, it also outperformed o3-based CUAs (i.e., the underlying model behind Operator).
The BrowseComp benchmark specifically evaluates the ability of browsing agents to locate hard-to-find information on the network; WebArena is used to evaluate the ability of web browsing agents to accomplish real-world network tasks
04 2025 Next Hot AI Agent “Leverage”
ChatGPT agent is so powerful, what value can it bring to us?
Recently, Hyung Won Chung, a recently retired OpenAI researcher, shared a speech video saying, “AI is becoming the most powerful ‘lever’ ever.”
He said that AI agents combine two levers: human and code.
It’s like hiring a helper to do the work for you (human leverage); And it is software that can be copied infinitely (code leverage).
1. Human leverage: AI agents can do work for humans, just like a human AI assistant.
2. Code Leverage: Current AI agents are in software-only form and can be easily replicated. If you want 10x the output, run 10 agents; If you want 12 times, add two. This leverage does not require permission and can be extended at will.
At this press conference, OpenAI reiterated the core topic of “AI leverage”.
They also believe that after 2025, AI agents will become a new leverage mechanism.
Not only does it do the work for humans, but it can also be infinitely scaled through “copy and paste”. Whether it’s 10 or more agents, they can be deployed with one click without additional permissions.
For example, the researchers said that in the nearly six months since the release of Deep Research, the output of individuals and small teams has been greatly improved.
In the past, team scale required relying on “human leverage”, but the communication costs of human collaboration, including friction, often became bottlenecks.
Now, a major significance of the emergence of ChatGPT agents is to allow small teams to create great value.
A startup with 10 or 20 people, with the leverage of AI, may create achievements comparable to those of tech giants.
This “super-individual” model may reshape the organizational structure of enterprises in the future, achieving exponential growth with AI.
05 Peking University alumni led the appearance
It is worth mentioning that in this live broadcast, two Chinese scholars appeared at the same time.
Zhiqing Sun
Sun Zhiqing joined OpenAI in June 2024 as a research scientist.
He received his Ph.D. in computer science from Carnegie Mellon University in 2025 and his bachelor’s degree in computer science from Peking University in 2019.
During his time at OpenAI, he was not only involved in key projects such as o3/o4-mini, computer-enabled agents, and Deep Research, but was also the research leader of Deep Research.
Casey Chu
Casey Chu joined OpenAI in April 2020 as a researcher.
Previously, he received his master’s degree in computational mathematics from Stanford University in 2019 and his bachelor’s degree in mathematics from Harvey Mudd College in 2016.
He later tried to pursue a doctorate in computational mathematics at Stanford University, but chose to drop out midway.
Don’t look at his title as just a researcher, but he is not only DALL· E 2, and also led the development of the initial prototype of GPT-4’s visual input.
Resources:
https://openai.com/index/introducing-chatgpt-agent/
https://x.com/xikun_zhang_/status/1945895070269583554