With the rapid development of AI technology, multi-agent systems are becoming a new breakthrough direction, and Agent Infra (agent infrastructure) has become the key to the implementation of these systems. E2B is an emerging player in this space, gaining traction for its “dedicated computer” support for projects like Manus. E2B not only allows users to run AI-generated code in a secure and isolated sandbox in the cloud, but also aims to become the AWS of the AI Agent era, supporting the entire lifecycle of agents from development to deployment.
The Multi agent system is becoming a new breakthrough direction, and agent infra has also become the key to landing. Virtual machines will emerge as potential entrepreneurial opportunities in the trend of paradigm innovation brought by computer use, and E2B is an emerging player in this space.
The reason why E2B has attracted market attention is largely because of Manus, and the virtual computer support for Manus agents in the process of completing tasks comes from E2B. Founded in 2023, E2B serves as an open-source infrastructure that allows users to run AI-generated code in a secure, isolated sandbox in the cloud. E2B is essentially a microVM that can be started quickly (~150 milliseconds), and its underlying layer is similar to AWS Firecracker, a representative microVM, on this basis, AI Agents can run code languages, use browsers, and call tools in various operating systems in E2B.
With the prosperity of the Agent ecosystem, E2B’s monthly sandbox creation volume increased from 40,000 to 15 million in one year, an increase of 375 times in one year.
Why do AI agents need their own “computers”?
To better understand this issue, we’ve compiled two of our latest interviews with CEO Vasek Mlejnsky, as well as an internal technical blog about computer use agents from within E2B. This article details the technical philosophy of E2B and the thinking behind the team’s transition from a code interpreter to a more general agent runtime.
E2B’s vision is big, and the CEO aims to be the AWS of the AI Agent era, to become an automated infra platform, it can provide GPU support in the future to meet the needs of more complex data analysis, small model training, game generation, etc., and can host applications built by agents, covering the entire life cycle of agents from development to deployment.
01. What is E2B?
E2B was founded, developed, and transformed
The two founders of E2B are Vasek Mlejnsky (CEO) and Tomas Valenta (CTO), from the Czech Republic. Before officially starting a business, the two had already collaborated on many projects.
Before creating E2B, Vasek and Tomas were working on a product called DevBook, which is an interactive document for developers that can be seen as the prototype of E2B. After the release of GPT-3.5, the duo tried to build agents to automate work, because each project needed a set of tools to integrate into the backend, so they used DevBook’s existing sandboxing technology to run the code, and sent the agent to GitHub, write code, and deploy to Railway to Twitter. The duo quickly open-sourced and shifted their focus to the sandbox environment, founding E2B in March 2023.
Vasek says the team realized from the beginning that code execution was a critical part of the agent’s stack. Code is like a universal language that acts as the glue that connects everythingHuman developers connect various services and APIs through code, so that the system can run, why can’t agents do the same thing? Agents need a secure, flexible code execution environment, and as agents become more powerful, code execution environments become more and more important.
A key point in the development of E2B is the introduction of the concept of “code interpreter”When the team started to use this word to explain the product, many users immediately understood: if users want to use AI for data analysis and visualization, code execution is very critical because these tasks require running AI-generated code; If you want to make large models smarter, such as being able to do mathematical operations, then the code can be regarded as a very general calculator; Some users want to build an AI-powered Excel, just describe what to do in each column, and the agent will dynamically generate code and execute the code based on the description, allowing users to easily complete tasks such as data enhancement and data analysis.
In the early days of entrepreneurship,E2B spends a lot of time educating the market and acquiring users by using clear use cases.
In October 2024, Anthropic launched computer use, but in fact, as early as six months ago, E2B already had a desktop version of the sandbox environment, but at that time, no models could actually use it, so this feature did not receive widespread attention. Around the end of 2024 to early 2025, the team began to observe users using sandboxes for computer use. At the same time, Vasek noticed that in 2024, people were still trying to build agents, but by 2025, agents have been increasingly put into production, and a large number of new use cases have emerged.
To achieve these three challenges, product managers will only continue to appreciate
Good product managers are very scarce, and product managers who understand users, business, and data are still in demand when they go out of the Internet. On the contrary, if you only do simple communication, inefficient execution, and shallow thinking, I am afraid that you will not be able to go through the torrent of the next 3-5 years.
View details >
With the development of this trend, people no longer use sandboxes only for running code snippets such as data analysis, and E2B has adjusted its product positioning.Instead of thinking of sandboxes as just code interpreters, they are gradually expanding into more general LLM or agent runtime environments.
Benefiting from the improvement of LLM capabilities and the implementation of agents, E2B has achieved very good growth in 2024 and will rise sharply in 2025, with the monthly creation volume of sandboxes increasing from 40,000 to 15 million in one year.
Product function and positioning
E2B provides a secure sandbox environment that allows AI agents to run securely in the cloud, and agents can use sandboxes to create files, use browsers, analyze data, write small applications, create Excel sheets, etc., and can achieve a wide range of task scenarios.
E2B supports a variety of code languages, with Python and JavaScript currently being the most used, with the former having nearly 500,000 SDK downloads per month and the latter having about 250,000 downloads.
Vasek hopes that E2B can become the AWS of the AI agent era, becoming an automated infra platform that can provide GPU support in the future to meet more complex data analysis, small model training, game generation and other needs, and can host agent-built applications, covering the entire lifecycle of the agent from development to deployment.
However, this does not mean that E2B will do prompts or memory, etc., Vasek believes that LLMs will continue to become more powerful, and many problems that seem complex now may be automatically solved in the future.
But some problems are more difficult to solve automatically: for example, how do you ensure that you always get a response from the LLM? If the model provider I connected to is down, how do I automatically switch to another provider? This is an old problem that developers have been facing for years, but now the scenario has changed: previously in traditional software, now in an AI environment.
In addition, traditional cloud computing is designed for existing applications, and the code executed by agents is dynamically generated and unpredictable. These problems still need to be solved in a new way.
Vasek mentioned that most of the time, developers find E2B with a simple requirement, such as starting with a serverless function or running an API interface on the server to execute code, and even running code locally in the early stages of development.
But as the product evolves, especially when you want to operate at scale, problems arise one after another. Like what:
- Security and isolation: Developers want to ensure that code from different users does not run in the same environment, because developers do not know what their code is doing or whether sensitive information has been leaked.
- Permissions and freedom: Developers want to give agents or AI applications as much freedom as possible to run any code they want, which may include accessing the full file system, downloading dependencies, etc.
These present various technical and security challenges, such as how to efficiently and dynamically generate isolated code execution environments, how to make these environments readily available, and how to ensure stability and security. Many developers are slowly realizing that this requires a more specialized and secure solution, and they naturally turn to products like E2B.
With the development of Multi Agent, the team will launch new features such as forking and checkpointing as soon as possible, so that multiple agents can try different solution paths in parallel, just like a tree structure, where each node is a sandbox snapshot that can branch out to enter the next state and finally find the optimal path, similar to Monte Carlo tree search. Forking and checkpointing are also great solutions to local state management issues, such as keeping intermediate progress and avoiding starting from scratch every time.
Monte Carlo tree search is a search algorithm commonly used in decision making problems, which combines the advantages of stochastic simulation and tree search to find near-optimal decisions with limited computing resources.
E2B has now implemented persistence, which is the basis for forking and checkpointing. Users can pause the sandbox and then return the sandbox to the paused state after a month, allowing the agent to run longer or intermittently execute tasks.
Usage scenarios
One of the most important use cases for E2B is to support AI data analysis.
The developer uploads a CSV file and asks the AI, “I uploaded a CSV file that contains these columns, and you can write Python code to analyze this data.” “But this code must have a place to run, and E2B has created a highly optimized runtime environment for this scenario and provides a dedicated SDK called the code interpreter SDK, which is very natural for the entire environment to close:
- Developers can directly create charts, even interactive charts;
- Developers can install third-party libraries, and E2B has pre-packaged commonly used data analysis packages.
- AI models can easily reference the code blocks they have generated earlier;
- If the code goes wrong, developers can quickly feed back the error message to the LLM and let it try to fix it.
For developers, due to the isolation mechanism of E2B, each agent has its own sandbox environment, and the agent will be more reliable.
The second most common use case for E2B is as a running platform for AI-generated applicationsThis field is developing very fast, especially as people begin to build AI-powered applications based on various frameworks, which require an environment that can run AI-generated code, that is, a dedicated runtime to support this AI-generated application logic.
Therefore, E2B has created an open-source template called Fragments, which developers can copy as a starting point for building their own AI application platform. Developers can directly enter “Help me build a to-do application with Next.js”, and the LLM will generate the corresponding code and send it to the sandbox environment to run, and the developer will see the application effect immediately.
E2B is now gradually moving into more scenarios, and Vasek has observed that some developers are executing code with the intention of not building developer-oriented agentsFor example, a developer has created an AI-powered Excel, which is not for developers, and its end users may be CEOs, business executives, business personnel, or anyone within the company who needs to use data. But at the bottom of this product, functions are achieved through code execution.
Vasek said he is seeing more and more of these trends – although it is still very early, it is a very exciting direction,Because people are realizing that code execution is not only for developers, but also for other types of users.
Another interesting use case is that Hugging Face used E2B in the training phase of reinforcement learning and code generation when building Open R1Specifically, the Open R1 model has a training step in which the model receives a programming problem, then needs to generate and run code somewhere, and then the reward function returns a 0 or 1 to indicate whether it is correct, and then uses this feedback to optimize the model.
Hugging Face uses E2B sandboxes to run this code, with hundreds or even thousands of sandboxes launched for each training step, resulting in high concurrency. This is very fast and doesn’t require expensive GPU clusters to handle. When using LLMs, developers don’t have to worry about changing the permissions in the cluster that will affect the system, because each sandbox is isolated and secure.
Vasek said several companies are already using E2B to train models in this way, and while this is not the scenario the team originally envisioned, it now seems very reasonable.From the perspective of the AI agent lifecycle, E2B should intervene as early as possible in the cycle, and the training phase may be the earliest link。
How to improve developer stickiness?
Improving developer engagement is the hardest part of building developer tools, and the key is finding the right developer experience (DX).
Vasek believes that “GPT Wrapper” is a good business at the moment, and it can quickly take advantage of the benefits brought by the improvement of the underlying model’s capabilities.At present, the cost of switching models for users is getting lower and lower, basically only one or two lines of code are needed to switch from Gemini to Claude or OpenAI, and users often switch models.However, it is difficult to ensure that your application or agent is still working properly during model switching.
An important value proposition of E2B is that when customers, especially large companies, use E2B, they don’t feel locked into an LLM. For example, OpenAI has its own code interpreter, but many customers don’t want to use it, on the one hand, because they can’t control it, and on the other hand, if they use it, they have to be bound to OpenAI all the time, and they can’t switch to Google or open source models, because OpenAI’s code interpreter will not adapt to other models, so they don’t have the motivation to use it at all. Therefore, E2B needs to make it easier for developers to switch models.
Vasek believes that the way to really win over developers is to provide an experience so good that developers hardly need to think about using a tool – like an extension of the brain.While this may sound like an easier time to change tools because developers don’t consciously rely on it, in fact, it’s precisely because developers don’t need to think about it that they are more reluctant to change tools that the tools are completely in the workflow. To achieve this effect, you need to pay attention to all kinds of small details, such as the developer doesn’t need to understand Infra’s complex logic or do a bunch of configuration files, which is a bad development experience.
All in all, E2B needs to be neutral about LLMs. From a technical perspective, E2B wants to be the Kubernetes of the agent space while having a better developer experience.
Kubernetes is an open-source container orchestration platform for automating the deployment, management, scaling, and operation of containerized applications.
02.What does E2B think about AI Agent?
How should software be priced in the era of agents?
Agent pricing is a very important issue to discuss, and some people think that traditional per-seat charging does not apply to agents, because some agents may only run for a few seconds, and some agents may need to run for hours, but if they are billed by usage, users may spend a lot of money without knowing it, causing users to be afraid to continue using it.
Vasek believes that pricing is really a very difficult thing for infra companies.
When founders start an infra company, they often want to make the pricing very simple, such as paying $100 per month for users, and if they exceed a certain limit, they will use other methods to charge, but when starting to scale, founders will find that there are many factors to consider, such as traffic, storage, and various small resource consumption, etc., and there is often a very complex price list in the end.
Therefore, how to communicate pricing logic to users becomes very importantBasic functions are a must, such as billing caps and expense warnings, which can ensure that users feel completely in control of their spending. At the same time, it is also necessary to provide good observability, and users must know exactly what resources they are using and how much they are spending.
The next use case for Agent: computer use
The three main use cases for agents that are now widely talked about are:
- Write code, such as Cursor;
- Sales, such as market expansion, has begun to be automated, and many of the tasks that sales representatives originally needed to do manually are being liberated;
- Customer support, such as Sierra and Decagon, is being used by Fortune 100 companies.
Vasek believes that the next scenario where agents can really work is to let agents control a computer or browser。
Anthropic was one of the first companies to publicly release relevant content last year, and OpenAI also released Operator this year. This brings up a variety of possibilities and challenges, such as users may not want the agent to use their computer completely freely, but users will still want some control, such as the ability to choose whether or not to allow the agent to operate.
Now people are building computers for agents, and E2B has also launched its own Desktop Sandbox, which is essentially a cloud computer with a graphical interface, and also open-sourced an open-computer-use project, combined with an open-source large model, to try to simulate the behavior of using a computer. This project is also a challenge for E2B:Can E2B build an agent that can use computers based on open source large models alone?
Vasek believes that in 2025, this direction will be very interesting because the potential returns are very high, but there is also a lot of uncertainty. Vasek isn’t entirely sure if agents will still operate in a cloud computer way in five years, and there may be better alternatives. But now, this is an area worth exploring, especially if E2B can create a digital twin of each local machine, which will be a big deal for enterprises and non-developers.
However, the goal of products like Operator is not to “watch the agent complete the task for you”, Vasek believes that the advantage of using an agent is that the psychological burden will be much less, because there is no need to think about “I have to do this” – in an ideal world, there would be a to-do application, and people just need to enter a task in this application, and the task will automatically be handled by an agent.
However, an interesting problem is that today’s websites, such as airline booking websites and hotel booking platforms, are optimized for clicks, and many companies spend millions of dollars to increase click-through rates. But now, those who visit these sites are starting to become agents.
Now there is also a potential concept that LLMs can allow users to improvise on various things, and users don’t even need a computer, everything can be “generated”.
How to teach AI to use a computer?
On the topic of computer use agent, E2B’s AI engineer James Murdza wrote a blog in January 2025, introducing an open-source computer use agent made by James that can use all the functions of a personal computer, such as receiving instructions such as “searching for cute cat pictures on the Internet”, reasoning through LLMs, and automatically operating the mouse and keyboard to complete tasks.
What sets this tool apart from others is that it is completely open source and only uses an open-source weight model. This means that anyone is free to run and modify this project.The principle is that the agent keeps taking screenshots and asks Llama what to do next until the model decides that the task is complete.
The technical challenges of this project were:
- 1. Security: The operating system needs to be isolated in a secure and controlled environment;
- 2. Click Operation: Allow AI to accurately click and manipulate UI elements;
- 3. Reasoning ability: Let the AI decide what to do next or when to end a task based on what it sees;
- 4. Deploy Niche LLMs: Open source models need to be hosted at low cost, especially the niche open source project OS-Atlas.
- 5. Live Streaming: Have a low-latency way to present and record sandbox environments.
Challenge 1: Security
After all, it is very dangerous to let AI agents directly access personal computers and file systems, because agents may mistakenly delete files or even perform irreparable operations. So instead of giving the agent direct access to the local computer, James used E2B.
Challenge 2: Click to operate
LLM-based “computer operations” are relatively simple when the interface is text-based, and many tasks can be completed with text instructions alone. However, some applications are basically impossible to use without a mouse, so if you want to be a truly comprehensive computer operation agent, you must support mouse click functions.
Traditional solutions that don’t work well include using traditional computer vision models as a bridge between the screen and the LLM, which are good at recognizing text and some icons, but can’t tell the difference between text boxes, buttons, or other interactive elements. Some Chinese scholars are studying grounded VLMs, which are vision+ language models that can output precise coordinates and point to specific locations in the input image. Gemini and Claude have similar capabilities, but they are not open source.
In the end, James chose OS-Atlas because OS-Atlas not only published the model weights on Hugging Face, but also detailed the model training process in a paper.
OS-Atlas is an open-source and free project that aims to provide gamers with an operating system to run games, customized based on Windows 10, removing various non-essential system components.
Challenge 3: Reasoning ability
The power of an agent is that it can choose between multiple actions and make sound judgments based on the latest information. Initially, users can use prompts to let the LLM output an action in a specific text format, then add the operation result to the context and call the LLM again for the next step. Later, users could enhance system prompts by fine-tuning, which was originally called “function calling” and is now more commonly referred to as “tool use”.
But combining visual input into the tool call process and completing visual understanding and operational decisions in a single LLM call was still relatively new at the time, and James said he used:
- Llama-3.2-90B-Vision-Instruct: Used to view the sandbox environment and decide what to do next.
- Llama 3.3-70B-Instruct: Translates Llama 3.2 decisions into tool call format;
- OS-Atlas-Base-7B: As a tool that can be “called”, click on the input prompt.
It’s worth mentioning that James thinks that the agent framework on the market is not very useful. The main role of these frameworks is to encapsulate the input format and output parsing of LLMs, do agent prompts, and run loops for agents, but James wants to run loops very simply, and he doesn’t want agent prompts to be directly black-boxed by the framework, because this is the part James needs to adjust most often. Therefore, the only possible use of the framework is to connect with LLM service providers, especially when it comes to tool calls and image support, but at that time, most service providers were already using OpenAI’s tool call format uniformly, and when there were exceptions, the framework documentation was often unclear.
James said that tool calls are not a single function, but a complete set of combinations, including LLM fine-tuning, prompt design, string format parsing, API interface specifications, etc. Whether on the server side or the client, it is difficult to abstract the framework in place to put these together and update them continuously, and in the end, developers still have to manually adjust them.
Challenge 4: Deploy a niche LLM
To make the agent run faster, James wants LLM inference to be in the cloud, but also wants users to use it out of the box. The problem was that James could find reliable service providers for relatively mainstream models like Llama, but for relatively unpopular models like OS-Atlas, many inference service providers were unwilling to provide serverless services, so James finally used the free space provided by Hugging Face to call OS-Atlas.
Challenge 5: Real-time picture streaming
To see what the AI is doing, James wanted to capture the screen in the sandbox environment in real time and successfully implemented it with FFmpeg.
FFmpeg is an open-source free software that can record, convert, and stream audio and video in a variety of formats.
Server-side command: Record the current screen as a video stream, open a server via HTTP, but only connect to one client at a time.
ffmpeg -f x11grab -s 1024×768 -framerate 30 -i $DISPLAY -vcodec libx264 -preset ultrafast -tune zerolatency -f mpegts -listen 1 http://localhost:8080
Client commands: The client connects to the server and saves the video and plays it back in real time.
ffmpeg -reconnect 1 -i http://servername:8080 -c:v libx264 -preset fast -crf 23 -c:a aac -b:a 128k -f mpegts -loglevel quiet – | tee output.ts | ffplay -autoexit -i -loglevel quiet –
James has been pondering the question throughout the development process: Should AI agents be controlled by APIs as much as possible, or should they rely on vision to simulate human clicks? The answer is clear: when you have an API, try to use it. The problem is that most software doesn’t think about being controlled by programs at all, so there are basically no suitable APIs.
Therefore, James chose to deliberately let the agent simulate human operation. However, when working as an agent, you should also consider other available interfaces besides the visual interface, such as:
- Standard APIs: file system APIs, Microsoft Office APIs, etc.;
- Code Execution Interface: Run Bash or Python scripts to open applications and read files;
- Accessibility API: This interface is often provided by desktop systems to “see” the GUI structure (Graphics User Interface). However, on Linux it is not as well supported as macOS or Windows;
- DOM interface for web pages: Semi-structured access to web page elements;
- MCP: Designed for agents, providing context and action portals.
James believes that the current only way to rely on vision is that most applications do not provide a friendly structured on-ramp at all, especially accessibility APIs, which would benefit not only AI agents but also human visually impaired users. If everything could be connected to the adapter like Zapier, it would be much more efficient.
There is also a big question that remains unanswered: how to handle user authentication securely? The least secure way is to give AI the same permissions as humans. It’s safer to set permission scopes, like OAuth authorization, iOS app permission control.
James created a new, isolated sandbox environment without any user data, but that didn’t fundamentally solve the problem. If users don’t have a safe way to do it, they tend to choose an unsafe approach. Therefore, James believes that the following questions are worth pondering:
1. How to provide the Computer Use Agent with permission-scoped API accessFor example, an agent can use traditional APIs to view a user’s mailbox inbox, but cannot delete or send emails.
2. How to desensitize sensitive information passed to LLMs and restore them in the outputFor example, users can preset some key information such as credit card numbers, which can be passed on to the tool but will not be exposed to the large model itself.
James expects open source models to rapidly advance towards visually capable inference, and is also looking forward to enhancing the capabilities by adding more API tools to agents.
Agent framework customization VS using an off-the-shelf framework
James mentioned above that the ready-made frameworks on the market are not easy to use, in the long run, will medium and large enterprises feel that their enterprise environment is special, so agents must have scalability and customization capabilities, so they can turn to their own development?
Vasek believes that these frameworks were first born in the very early stages of LLM development, when many core concepts were constantly evolving and are still changing, but at least some consensus has been formed, such as certain types of prompts can be used efficiently, methods such as Chain of Thought, ReAct, etc. have become more stable, and agents have gradually figured out how to use tools, etc.
For developers, if the framework itself is constantly changing, it will be painful to develop. Instead of having ten different ways to do one thing, there is a clearly available way, which is why Vasek uses frameworks.
Vasek believes that each framework has its own clear “methodology” and preferences, and developers need to recognize it, and in the future, frameworks with clear opinionated frameworks will become more and more popular, and developers will be more willing to accept them, as Crew AI and LangGraph already have.
Crew AI is an open-source multi-agent coordination framework, and LangGraph is a module built on top of LangChain launched by the LangChain team to build stateful and multi-role agents applications.
The evolution of the framework is a battle with no end, and there will always be new frames. The current agent framework debate is like the original transformer model debate, such as the battle between Anthropic and OpenAI, but now it has evolved into a dispute between Crew and LangGraph. Developers may not be able to make money directly through the framework itself, but they can capture value around infra or related services. Many framework teams are expanding their product range, such as LangChain, which developed LangGraph and LangSmith, and began to position themselves as “full agent solutions”.
Vasek specifically mentioned that developers don’t necessarily need to use frameworks when they don’t know which way they really like to build. Nowadays, some frameworks are not entirely agent frameworks in the traditional sense, such as LangChain, which is more of a tool for interaction with large models.
03. Why choose to take root in Silicon Valley?
The two founders of E2B came from a border town in the Czech Republic and met each other in sixth grade. Both later moved to Prague, the capital, to study computer science, and although Tomas later transferred to other cities, he worked on various projects with Vasek in Prague every summer.
After founding E2B, although both founders were Czech, they ultimately chose to develop in the United States rather than Europe because Vasek believed that products should be built in the user’s location, and E2B’s users are engineers developing AI applications, most of whom are from Silicon Valley, so starting a business in Silicon Valley is a reasonable and logical choice.
Vasek didn’t plan to actually move to Silicon Valley at first, thinking he could come over every two months to do some marketing and sales-related things. Starting in 2023, the four-person core team in the early days of E2B would spend a month or two together, but every time they came to San Francisco, it was clear that the team could feel things moving faster, especially in the early stages, when it was very straightforward to help a user get started with E2B: sit together and coach in person. The efficiency and interactivity of face-to-face support can never be compared to remote support.
For example, in Prague, Vasek chatted with ten people about entrepreneurship, and only one person might be able to bring inspiration, but in Silicon Valley, there may be only 5 or 6 ordinary conversations, and the lunch chat may be a high-density, high-quality conversation.
In addition, although teams can be distributed, in the early days, the founding team needs to be in the same place, because at that stage, every day changes rapidly, even a few hours with new ideas, new decisions, everything is dynamic, everyone has to be together, face to face, act quickly. So Vasek firmly chose to take root in Silicon Valley.