OpenAI’s “Manus Moment” is here: the ChatGPT Agent is officially released

Yesterday, OpenAI officially released ChatGPT Agent, an AI agent that combines DeepResearch’s in-depth research capabilities with Operator’s web page interaction functions, which is seen as its “Manus moment”.

On July 18, 2025, at 1 a.m. Beijing time, OpenAI conducted a live broadcast to introduce their latest and most powerful model.

They combined DeepResearch and Operator functions to create an AI agent that can do both in-depth research and browser-use – ChatGPT Agent.

Or, you can understand it more simply as OpenAI has released a “Manus” of its own.

The AI Agent created by OpenAI is oriented to general-purpose needs based on past experience, and is expected to form a dimensionality reduction blow to opponents in the same track, both from a cost perspective and a technical perspective.

During the livestream, Sam Altman and four researchers from the Agent project (previously members of the Operator and DeepResearch projects) demonstrated the capabilities of ChatGPT Agent and its performance in benchmarks by participating in demonstrations such as wedding planning.

B-end product manager’s ability model and learning improvement
The first challenge faced by B-end product managers is how to correctly analyze and diagnose business problems. This is also the most difficult part, product design knowledge is basically not helpful for this part of the work, if you want to do a good job in business analysis and diagnosis, you must have a solid …

View details >

Using the Agent mode is simple, just click on the tools menu in the ChatGPT client and select “Agent”.

ChatGPT Agent can independently use virtual computers to perform complex tasks, seamlessly switch from thinking to action, and use various tools such as writing code on the terminal, performing web browsing, creating Excel and PPT, and more.

Start with a presentation of everyday life: preparing for a friend’s wedding.

The researchers not only wrote down the detailed requirements, but also provided ChatGPT with a wedding website and a website for booking hotels.

Prompt:

Our friend is getting married later this year! This is the wedding website: XXX

Can you help me find the following items:

1) A set of clothing that meets the dress code for all occasions (men)

– Five options are recommended. The outfit should include some good, moderately luxurious items that match the venue and weather.

2) Help me find some hotels that I can book a few days in advance

– Book with booking.com and make sure to check availability and current prices.

3) Also, don’t forget to pick a gift for them, preferably under $500

Write a beautiful report

After receiving the prompt, ChatGPT Agent immediately began to execute.

Because it requires the use of a computer, it initially needs to set up the environment, which usually takes a minute or two, and even less than 5 seconds (7 seconds in the actual demo). After preparing the environment and understanding the prompt, ChatGPT Agent will confirm to the user whether its understanding is accurate, and the user clicks “continue” and the ChatGPT Agent will start working.

When ChatGPT Agent performs tasks, users can synchronize the process of operating the computer screen and the chain of thought associated with each step.

In this task, ChatGPT Agent ended up providing a fairly comprehensive report.

It determines the date and wedding venue based on the link, and uses it to determine suit recommendations, where to buy them, listing information, etc., and it also provides recommendations on gifts. In particular, ChatGPT Agent also provides screenshots of browsing results.

After the ChatGPT Agent completes its task, users can also review its execution process through video.

ChatGPT Agent can use two different ways to browse the internet.

One is a text browser, similar to DeepResearch, which can read and search a large number of web pages very efficiently and quickly.

The other is a visual browser, similar to an Operator, which enables it to actually interact with the web page UI.

Using the browser, ChatGPT Agent can perform actions such as dragging web pages, clicking with the cursor, opening UI components, filling out forms, entering text, and more.

The OpenAI team said that the two tools are highly complementary.

OpenAI released Operator in January, which can perform online tasks such as booking and sending emails. Two weeks later, OpenAI released DeepResearch, which can conduct in-depth Internet research and output high-quality research reports.

Later, OpenAI realized that the two approaches were actually deeply complementary. On the one hand, Operator has some difficulty reading very long articles because it needs to be scrolled, so it’s time-consuming, but that’s where DeepResearch excels. On the other hand, DeepResearch is inferior to Operator in interacting with web pages, interactive elements, and visuals (highly visual web pages).

OpenAI has also learned from user feedback that one of the most anticipated DeepResearch features is the ability to log in to websites and access authenticated sources, and Operators can do just that. In addition, many users use Operator prompts that are actually similar to DeepResearch-type prompts.

A key capability of ChatGPT Agent is the ability to interrupt execution at any time, supplementing with new task instructions. This is especially important for performing complex and time-consuming tasks, where many times the prompt entered at the beginning is incomplete. For example, in this task of preparing for a wedding, you can prompt the model halfway: Can you help me find another pair of men’s black shoes in size 9.5?

ChatGPT Agent may also proactively ask users for clarification and confirmation of some details during execution.

OpenAI introduced that a key thing to note when working with Agent is that models sometimes make mistakes, “which is why it is important to train the model to request user confirmation at the last step of an important step.” ”

For example, before it sends an email, it asks users to review the draft, whether the content is reasonable, if there are spelling mistakes, etc. If so, you can ask it to modify it, or you can take over the browser and enter the agent’s environment to modify it yourself.

Therefore, ChatGPT Agent encourages not completely autonomous execution, but deep collaboration with users.

ChatGPT also has its own terminal to run code and can be used to generate and analyze files such as PPT, Excel, etc.

Through the terminal, it can also call APIs, including public APIs and APIs for accessing users’ private data sources (e.g. Google Drive, Google Calendar, Github Sharepoint, etc.). You can even ask ChatGPT to call the image generation API to generate images, so you can create beautiful visuals for content like PPTs. Similar to the Deep Research Connector, it can only be used if the user explicitly connects to these APIs.

In the demo, OpenAI researchers generated a benchmark report for ChatGPT Agent by asking it to call API operations.

Prompt:

Pull your assessment number from our Google Drive and make some slides. The form is kept simple, with no introduction, no conclusion, only the results are presented with graphs.

The model is connected to the Google Drive API and then searched within the API. The first result is relevant, so the model starts reading the first result in detail, then writes code and uses the image generation model to generate images for the PPT.

The final model generates a PPT document that can be downloaded and opened locally.

Let’s take a look at the benchmark results of ChatGPT Agent specifically.

In the “Humanity’s Last Exam” (HLE), a multimodal benchmark for the frontiers of human knowledge, ChatGPT Agent with full tool use capabilities surpassed DeepSesearch (with browser use and python code capabilities) and o3 (with browser use and python code capabilities), and its performance almost doubled compared to the latter two. It achieved a 42% pass rate, while ChatGPT Agent and o3 without tool-use capabilities were at the bottom.

The FrontierMath benchmark is used to measure advanced mathematical reasoning abilities, and ChatGPT Agent achieved a 27% pass rate in this benchmark, surpassing o4-mini and o3 with Python coding capabilities.

In the WebArena benchmark, ChatGPT Agent already performed close to human and outperformed o3 and 4o.

In the BrowserComp benchmark (which measures the agent’s ability to search and find information) launched by OpenAI at the beginning of the year, ChatGPT Agent significantly outperformed the o3 and DeepResearch models.

The Spreadsheet Bench benchmark measures some level of ability to create and edit spreadsheets, with ChatGPT Agent using LibreOffice and other tools already completing 30% of the tasks, and performance further improving to 45% when the model is given access to the original Excel file in the terminal.

The Internal Banking Benchmark benchmark evaluates the model’s ability to perform tasks for investment banking analysts with 1 to 3 years of experience, such as building a three-table financial model for a Fortune 500 company. In this benchmark, ChatGPT Agent significantly outperformed DeepResearch and o3.

OpenAI said that ChatGPT Agent is one of OpenAI’s most powerful models at the moment, not only performing well in benchmarks, but also capable of reasoning, browsing, and processing real-world tasks, “at a level we couldn’t have imagined three months ago.” And much of its strength comes from the ability to browse the Internet. ”

OpenAI officials also emphasized that from a security perspective, it is still risky to have AI Agents perform web browsing, “The Internet is still a scary place, with all kinds of cyberattacks, scams and phishing trying to steal people’s information, and the agent model is not immune to all of these attacks. ”

“We are particularly concerned about a new attack called ‘prompt injection’. Suppose you ask an agent to buy a book for you and enter your credit card information, the agent may accidentally enter a malicious website and ask it to enter its credit card information, which it may do. ”

“We did a lot of work to prevent this, such as training the model to ignore suspicious instructions on suspicious websites and setting up multi-layered monitors to monitor the operation of agents. We can even update this information in real time to protect against new attacks. ”

OpenAI said that it is impossible to prevent all risks, so it is still necessary for users themselves to be aware of the existence of risks, such as trying not to share highly sensitive information or using takeover mode reasonably.

OpenAI also provided an interesting demo that asked ChatGPT Agent to create an optimal itinerary to visit all 30 MLB ballparks, and finally present the final plan in the form of a detailed spreadsheet.

Interestingly, ChatGPT Agent actually used code to build maps, and it was successfully implemented.

Finally, OpenAI said that ChatGPT Agent will be available for Pro, Plus, and Team users. Pro users will get 400 queries per month, and Plus and Team users will get 40 queries per month. The deployment of the Pro version is expected to be completed by the end of this month, the Plus version will be completed soon, and the Team version will strive to launch the Enterprise and Education editions by the end of this month.

We hope you guys will enjoy it. Although it is still in its early stages, we will improve it quickly and we are very much looking forward to seeing how it evolves next. The OpenAI team said.

End of text
 0