ChatGPT Agent is here with its own Virtual Machine and Tools!
Unleashing the Unified AI: Introducing ChatGPT Agent for Complex Tasks
Today marks a significant leap in how we interact with AI, as OpenAI officially launches ChatGPT Agent. This new, unified agent capability is designed to help users tackle complex, real-world tasks over long time horizons, seamlessly integrating thinking and action using a wide array of tools.
What is ChatGPT Agent?
For a while now, users have been excited about AI's ability to perform complex tasks, previously seen with tools like Deep Research and Operator. However, the clear demand was for a unified agent that could bring these capabilities, and more, together. That's exactly what ChatGPT Agent delivers. It's an AI that uses its own virtual computer to carry out tasks, providing a visualization of its computer screen and its chain of thought as it works.
A Unified Toolbox for Universal Tasks
ChatGPT Agent comes equipped with a comprehensive suite of tools, making it incredibly versatile:
Text Browser: Similar to the Deep Research tool, this allows the agent to efficiently and quickly read many web pages and search for information.
Visual Browser: Akin to the Operator tool, this enables the agent to interact directly with web page UIs. It can click, drag, open components, fill out forms, and enter text, offering immense flexibility.
Terminal Access: The agent can run code, generate and analyze files (like spreadsheets and slide decks), and even call APIs – both public ones and private data sources you explicitly connect, such as Google Drive, Google Calendar, GitHub, and SharePoint.
Image Generation API: For creating visuals within its tasks, like decorations for slide decks.
These two browsing tools are highly complementary. For instance, Operator previously struggled with reading super long articles, which Deep Research excels at. Conversely, Deep Research wasn't as good at interacting with highly visual, interactive web elements, which Operator does well. ChatGPT Agent now brings the best of both Deep Research and Operator together, along with new capabilities like logging into websites and accessing authenticated sources – a highly requested feature for Deep Research users.
How Agent Learns and Collaborates
The model is trained using reinforcement learning to intelligently choose which tool to use and when. By training it on "hard tasks" that require the use of all these tools, the model learns not only how to use them but also how to make smarter tool choices over time for efficiency.
A key aspect of ChatGPT Agent is its collaborative nature. It's designed to be multi-turn, meaning users can interject, direct, or provide more guidance mid-trajectory. The agent can ask clarifying questions (though not every time, unlike Deep Research), accept interruptions, and even ask for confirmation at important steps, such as before sending an email, allowing users to review and make corrections. It can also review and refine its own results to deliver a polished final output.
Real-World Applications in Action
The launch demonstrations highlighted the agent's impressive ability to handle diverse real-world scenarios:
Wedding Planning: The agent was tasked with finding an outfit matching a dress code, proposing mid-luxury options based on venue and weather, finding hotels, and suggesting a gift. It even provided details like Zilla links, suit recommendations, and screenshots of hotel availability checks.
Creating Team Swag: It generated anime art for laptop stickers (using the image generation tool) and even started the process of ordering 500 of them through a familiar service.
Dynamic Task Management: Users could interrupt ongoing tasks, like wedding planning or sticker ordering, to add new requests, such as finding a pair of men's black dress shoes, and the agent would seamlessly integrate them into its trajectory.
Self-Evaluation and Reporting: In a meta-demonstration, the agent pulled its own evaluation numbers from a Google Drive connector, used its image generation tool for slide decorations, wrote code to compile the slides, and produced a downloadable PowerPoint file with performance charts.
Complex Itinerary Planning: One particularly fun example involved building an optimal itinerary for visiting all 30 MLB stadiums, prioritizing "Hello Kitty nights," and presenting the detailed plan as a spreadsheet and a cool map.
Impressive Performance Benchmarks
The agent model has shown significant improvements across various benchmarks, demonstrating its powerful reasoning, browsing, and task-tackling capabilities:
Humanities Last Exam: Nearly doubled its performance to 42% when given access to all tools.
Front TMS: Achieved a new state of the art of 27% on advanced mathematical reasoning with the help of all its tools.
Web Arena: Improved over the previous O3 model on real-world web tasks.
Browse Comp: Significantly outperformed O3 and Deep Research, achieving a 69% pass rate in locating information.
Spreadsheet Bench: Solved 30% of tasks editing spreadsheets with LibreOffice, boosting to 45% when given access to raw Excel files in the terminal.
Internal Banking Benchmark: Significantly outperformed previous models on tasks like putting together a three-statement financial model for a Fortune 500 company.
It's clear this is "one of the most powerful models we've ever trained," capable of tackling real-world tasks at a level unimaginable just a few months ago.
Navigating New Risks with Caution
While incredibly powerful, this new level of AI capability also introduces new risks, particularly concerning prompt injections. This is a scenario where a malicious website could trick the agent into revealing sensitive information, like credit card details, even if you initially asked it to do something innocuous.
OpenAI has implemented robust safety measures, including:
Training the model to ignore suspicious instructions from malicious websites.
Deploying layers of monitors that watch the agent's actions in real-time and can stop a suspicious trajectory. These monitors can even be updated in real-time as new attacks are discovered.
However, users are strongly encouraged to be aware of the risks and proactive in managing their information. For highly sensitive data, it's recommended to use features like takeover mode to directly input information into the browser yourself, rather than providing it to the agent. This new technology will require society and technology to evolve and learn how to mitigate unforeseen attacks.
Availability
ChatGPT Agent is rolling out today for Pro Plus and Team users. Pro users will have access to 400 queries per month, while Team users will have 40 queries per month. The rollout for Pro is expected to finish by the end of the day, with Plus and Team users very soon, and Enterprise and Edu users by the end of the month.
We hope you'll love this new capability. It's still very early, and the team is committed to rapid improvements. We're excited to see where it all goes!



