A guide to building efficient agents

This article provides an in-depth analysis of agent development, from concept to practice, and discusses the definition, applicable scenarios, design elements, security and orchestration modes of agents. Based on practical experience, the author provides detailed guidance and advice for product and engineering teams trying to build agents for the first time.

This year, I have been engaged in Agent-related work, so I have formed my own set of AI project experience, but AI is the most afraid of ignorance, so I read various reports every day.

The report has a total of 32 pages, and its table of contents is structured as follows:

  1. What is an agent?    4
  2. When should you build an agent?    5
  3. Agent Design Fundamentals 7
  4. Security 24
  5. Conclusion 32

introduction

Large language models (LLMs) are rapidly improving in their capabilities and are now capable of handling complex, multi-step tasks. Breakthroughs in inference, multimodal processing, and tool calling have given rise to a new LLM-driven system – agent.

After 10 years of interaction design, why did I transfer to product manager?
After the real job transfer, I found that many jobs were still beyond my imagination. The work of a product manager is indeed more complicated. Theoretically, the work of a product manager includes all aspects of the product, from market research, user research, data analysis…

View details >

Written for product and engineering teams trying to build agents for the first time, this guide brings together takeaways from customer deployments and distilled them into actionable best practices. The content covers:

  • screening framework for high-potential application scenarios;
  • a clear paradigm for designing agent logic and orchestration;
  • key practices to ensure the safe, predictable, and efficient operation of agents;
  • By the end of this guide, you’ll have the core knowledge needed to build your first agent and get started with it.

What is an Agent?

Traditional software helps users streamline and automate workflows.

Agents can autonomously perform the same process for users.

An agent is a system that completes tasks on behalf of users under the premise of high autonomy.

A workflow is a series of steps that must be performed in sequence to achieve a user’s goal, such as resolving a customer service issue, making a reservation at a restaurant, submitting a code change, or generating a data report.

Non-agent scenarios: Integrating LLMs into applications without allowing them to control process execution (e.g., simple chatbots, single-round question-answering LLMs, sentiment classifiers, etc.) – these are not agents.

Therefore, to do agent development, you first need to clearly define the Agent:

First, LLM-driven process control and decision-making

  1. Use LLMs to make decisions and control workflow execution
  2. Independently judge the completion status of tasks
  3. Support error self-correction mechanism
  4. If it fails, the process can be aborted and control is handed over to the user

Second, multiple tools are invoked and controlled by security policies

  1. Access to various tools and interact with external systems (information acquisition/execution of operations)
  2. Dynamically select tools based on workflow status
  3. Always operate within preset security boundaries

To summarize here, the core of the current agent actually lies in two points:

  1. whether the model relies on itself to generate reliable workflows;
  2. whether the agent calls various tools to execute the end;

The reason why the model is confident enough to arrange its own workflow is based on the significant improvement of the model’s basic capabilities.

When should you build the Agent

Building agents means rethinking the way systems handle decision-making and complexity.

Unlike traditional automation, agents are particularly useful for workflows where deterministic, rule-driven approaches are stretched.

Take payment fraud analysis as an example:

Traditional rules engines act like a checklist that marks transactions based on preset conditions.

LLM agents are more like seasoned investigators, synthesizing context, catching subtle patterns, and identifying suspicious behavior even if they don’t trigger explicit rules.

This delicate reasoning ability is the key for agents to shine in complex and ambiguous scenarios.

PS: I would like to mention here that in terms of real practice, the efficiency and accuracy of the rule engine are higher, and the so-called subtle mode of the agent here actually belongs to the rules missed by the rule engine, and logically the rule engine needs to be supplemented;

Real applications will follow the fast and slow system, that is, the rule engine is the first round, and the model is the bottom

So when should you consider an agent?

When evaluating the value of agents, prioritize processes that are always difficult to [fully] cover with traditional automation, especially where there are pain points in the rule methodology:

Before officially committing to building agents, please make sure that your use case does meet the above criteria.  If the process can be solved with a simple, reliable deterministic solution, there is no need to force agents.

PS: In fact, the biggest problem is that the model must ensure that it will not make mistakes, at least the accuracy rate is above a value, otherwise it is difficult for the agent to be trusted

Agent design has three elements

In its most basic form, an agent consists of three core components:

weather_agent = Agent(

name=”Weather agent”,

instructions=”You are a helpful agent who can talk to users about the weather.”,

tools=[get_weather],

model=”gpt-4″ # Specify the LLM model to be used

)

1. Model selection strategy

Different models differ in task complexity, latency, and cost:

As discussed in the next section, “Orchestration”, you often need to mix multiple models by task type in the same workflow.

Not all steps require the strongest model

Simple retrieval or intent classification can be done with small, fast models.

More difficult decisions, such as whether to approve a refund, may require a more robust model.

A proven approach is to complete all the steps with the most powerful model to obtain benchmark performance; Subsequently, try to replace some links with smaller models to see if they still achieve acceptable results.

This will not limit the ability of the agent prematurely, and can clearly locate the boundary between success and failure of the small model.

PS: In fact, with the low cost of the current large model, the strongest model can be used
However, the more troublesome thing is that there are still many privatization deployment scenarios that have to rely on small models, so this strategy is applicable

The selection principle considers three points:

  1. Establish evals: first use the best model to run through the whole process to form a performance benchmark.
  2. Ensure accuracy first: Consider optimization on the premise of meeting the target accuracy.
  3. Optimize costs and latency: Replace large models with smaller models without affecting performance.

2. Define the tool

By calling the underlying application or system’s API, the tool can extend the agent’s capabilities.

For traditional systems that lack APIs, agents can directly control web pages or desktop interfaces with the help of “computer operation models”, which is no different from human operations.

Each tool should have a standardized definition that allows for flexible reuse between multiple agents to form many-to-many relationships.

Well-documented, well-tested, reusable tools improve discoverability, simplify version management, and avoid duplicate wheels.

PS: The so-called computer-use here is far less mature than everyone thinks, and there is still a lot of room for optimization

There are three types of tools commonly used by agents

Here’s how to add a set of tools (web search + result storage) to the weather_agent agent mentioned above in the OpenAI Agents SDK:

from agents import Agent, WebSearchTool, function_tool

import datetime, db # assumes that there is already a database operation module

@function_tool

def save_results(output: str) -> str:

# Write search results to the database

db.insert({“output”: output, “timestamp”: datetime.datetime.now()})

return “File saved”

search_agent = Agent(

name=”Search agent”,

instructions=”Help the user search the internet and save results if asked.”,

tools=[WebSearchTool(), save_results],

)

As the number of tools required increases, it is recommended to split the task into multiple agents to work together (see Orchestration section for details).

3. Instruction configuration

High-quality instructions are crucial for any LLM application, especially for agents.

The clearer the instructions, the less ambiguity and the more reliable the agent’s decisions are—resulting in smoother and fewer errors throughout the workflow.

Agent Directive Best Practices

Automatically generate instructions using higher-order models

You can have high-performance models such as o1, o3-mini, etc. generate specification instructions directly from existing documentation.

The following example of an English prompt illustrates this idea:

You’re an expert in writing instructions for LLM agents.

Please convert the following Help Center document into a clear list of instructions, using a numbered list format.

This document is a policy for LLMs to follow.

Make sure there is no ambiguity and write in a way that the agent can directly execute.

The Help Center documentation to be converted is as follows: {{help_center_doc}}

4. Choreography

Once the foundation components are in place, you can enable agents to execute workflows efficiently by selecting the appropriate orchestration pattern.

While it is attractive to get started with a complex, fully autonomous agent, practice shows that a step-by-step, iterative approach is often more likely to be successful.

There are two main categories of orchestration patterns:

  1. Single-agent system. A model is equipped with the necessary tools and instructions to loop through the entire workflow.
  2. Multi-agent system. Split the workflow and hand it over to multiple agents to complete together, each performing its own role.

Next, we will unfold on each of these two modes.

Single-agent system

A single agent initially requires only the most basic model and one or two tools to operate; As demand increases, it is gradually “fitted” with new tools.

This allows functionality to grow naturally as the project iterates, without introducing additional orchestration costs due to premature splitting into multiple agents.

Its core components are:

Any orchestration scheme relies on a “run” concept—usually implemented as a loop that keeps the agent working until the exit condition is met. Common exit conditions include:

  • The required tool call has been completed
  • The specified structured output is generated
  • Occurred error
  • Reach the maximum number of rounds

For example, in the Agents SDK, the agent is started through this method, which loops through the LLM until either of the following occurs: Runner.run()

  1. The final-output tool defined by the specific output type is called
  2. The model returned a reply without any tool calls (e.g., a message directly to the user)

Example usage:

Agents.run(agent, [UserMessage()]) # “What’s the capital of the USA?”

This concept of a while loop is at the heart of the agent’s operating mechanism.

In a multi-agent system (as we’ll see later), a series of tool calls and handoffs between agents can occur, but still allow the model to perform multiple steps in a row before the exit conditions are met.

An effective strategy for managing complexity without switching to a multi-agent framework is to use prompt templates.

Instead of maintaining a large number of separate prompts for different use cases, use a flexible base prompt and inject policy variables.

This template approach can be easily adapted to various scenarios, significantly simplifying maintenance and evaluation. When a new use case arises, simply update the variable without rewriting the entire workflow:

You’re a call center agent.

You’re communicating with {{user_first_name}}, who is already a member {{user_tenure}}.

The most common complaint category for this user is {{user_complaint_categories}}.

Please say hello to our users, thank them for their loyal support, and answer any questions they may have!

So, the question is: when should you consider creating multiple agents?

Our overall recommendation is to prioritize the full exploitation of individual agents’ capabilities.

Multiple agents can bring about an intuitive division of labor conceptually, but at the same time introduce additional complexity and overhead; In many scenarios, an agent with the right tools is sufficient.

For complex workflows, splitting prompts and tools into multiple agents often improves performance and scalability.

If your agents struggle to execute complex commands or often choose the wrong tools, you may need to further segment your system and introduce more independent agents.

A hands-on guide to splitting an agent

Next, let’s introduce the multi-agent system.

Multi-agent system

While multi-agent systems can be designed in a variety of forms based on specific workflows and needs, our customer practice shows that there are two types of universal models:

First, Manager, agents as tools

A centralized “manager” agent coordinates multiple specialized agents through tool calls, each responsible for a specific task or domain.

Second, decentralized, agents handing off to agents.

Multiple agents operate as peers, handing off tasks to each other based on their expertise.

Multi-agent systems can be abstracted into graph structures: nodes represent agents:

In manager mode, a centralized “manager” agent coordinates multiple specialized agents through tool calls; Each agent is only responsible for tasks or areas where they excel.

In the decentralized model, multiple agents work together as peers, handoffing tasks to the most appropriate agent according to their respective expertise.

Regardless of the orchestration model, the core principle remains the same: keep components flexible, composable, and driven by clear, structured prompts.

1. Manager mode

The so-called manager model is very similar to DeepSeek’s MoE architecture. Manager mode gives a centralized large language model (LLM) the ability to “manager” to seamlessly orchestrate a network of specialized agents through tool calls.

Instead of losing context or control of the process, managers can intelligently assign tasks to the right agents at the right time and effortlessly integrate the outputs of each agent into a coherent interaction.

This allows users to have a smooth and unified experience, while various specialized capabilities can be called upon at any time.

This is ideal when you want a single agent to control the execution of the entire workflow, and that agent needs to interact directly with the user.

For example, implementing Manager patterns in the Agents SDK:

from agents import Agent, Runner # Sample import

# ——– Define three dedicated translation agents ——–

spanish_agent = Agent(

name=”translate_to_spanish”,

instructions=”Translate the user’s message to Spanish”

)

french_agent = Agent(

name=”translate_to_french”,

instructions=”Translate the user’s message to French”

)

italian_agent = Agent(

name=”translate_to_italian”,

instructions=”Translate the user’s message to Italian”

)

# ——– Define manager agent ——–

manager_agent = Agent(

name=”manager_agent”,

instructions=(

“You are a translation agent. You use the tools given to you to translate. ”

“If asked for multiple translations, you call the relevant tools.”

),

tools=[

spanish_agent.as_tool(

tool_name=”translate_to_spanish”,

tool_description=”Translate the user’s message to Spanish”,

),

french_agent.as_tool(

tool_name=”translate_to_french”,

tool_description=”Translate the user’s message to French”,

),

italian_agent.as_tool(

tool_name=”translate_to_italian”,

tool_description=”Translate the user’s message to Italian”,

),

],

)

# ——– Running example ——–

asyncdef main():

msg = input(“Please enter the text you want to translate: “)

 

orchestrator_output = await Runner.run(

manager_agent, msg

)

print(“Translation step:”)

for message in orchestrator_output.new_messages:

print(f”  – {message.content}”)

# Call example:

# Enter: Translate ‘hello’ to Spanish, French and Italian for me!

Declarative vs. non-declarative diagrams

Declarative frameworks, some frameworks require developers to use graphical methods (nodes = agents; Edge = Deterministic or Dynamic Handoff) explicitly defines each branch, loop, and condition in the workflow.

Advantage: Clear visualization.

Disadvantages: When workflows are more dynamic and complex, this approach can quickly become cumbersome and even require learning specialized domain languages (DSLs).

A non-declarative, code-first approach that allows developers to express workflow logic directly using familiar programming structures without the need to draw a complete diagram in advance.

Benefits: More flexible, adaptable, and dynamically orchestrate agents based on runtime requirements.

Many students here may not understand very well, let me make a simple explanation, the so-called declarative structure, he is like drawing a flow chart, he needs to define all the steps and routes in advance, such as the bank account opening automation process:

The advantages are clear: the process is stable, but the disadvantages are also obvious, and it is very annoying to adjust the process in the logic of responsibility, such as modifying the entire flowchart or redefining all connections.

Instead of declarative, that is, code first, just change a few lines of code in this situation…

In the vernacular, it is: the declarative style is to drag and drop with buttons and dify; Code priority means having an engineering team to write code.

2. Decentralized model

In the decentralized model, agents can “handoff” workflow execution rights to each other.

Handoff is a one-way transfer mechanism that allows one agent to delegate a task to another.

In the Agents SDK, handoffs are designed as a tool or function type. When an agent calls the handover function, the system immediately starts the execution process of the target agent and transfers the latest session state synchronously.

Its core features are:

  • Equal Collaboration: This model relies on multiple agents on an equal footing working together
  • Direct control transfer: One agent can directly hand over workflow control to another agent
  • No central scheduling required: Suitable for scenarios where a single agent does not need to maintain centralized control or comprehensive processing
  • Dynamic Interaction: Each agent can take over the execution process and interact directly with the user as needed

In summary, this mode achieves optimal performance when the workflow does not require global coordination by a central controller and is more suitable for phased autonomous processing by different agents.

Here’s how to use Agents to implement a decentralized workflow that handles “sales + after-sales support” at the same time.

The core idea is that the Triage Agent will first divert the flow and then handoff the session to the most appropriate dedicated agent:

from agents import Agent, Runner

# ────────────────────── Full-time agent ──────────────────────

technical_support_agent = Agent(

name=”Technical Support Agent”,

instructions=(

“You provide expert assistance with resolving technical issues, ”

“system outages, or product troubleshooting.”

),

tools=[search_knowledge_base] # ※ Consult the knowledge base

)

sales_assistant_agent = Agent(

name=”Sales Assistant Agent”,

instructions=(

“You help enterprise clients browse the product catalog, ”

“recommend suitable solutions, and facilitate purchase transactions.”

),

tools=[initiate_purchase_order] # ※ Generate a purchase order

)

order_management_agent = Agent(

name=”Order Management Agent”,

instructions=(

“You assist clients with inquiries regarding order tracking, ”

“delivery schedules, and processing refunds.”

),

tools=[track_order_status, # ※ Track the status of your order

initiate_refund_process] # ※ Initiate the refund process

)

# ────────────────────── Shunt agent ──────────────────────

triage_agent = Agent(

name=”Triage Agent”,

instructions=(

“You act as the first point of contact, assessing customer ”

“queries and directing them promptly to the correct specialized agent.”

),

handoffs=[technical_support_agent,

sales_assistant_agent,

order_management_agent] # Handable objects

)

 

# ────────────────────── Run the example ──────────────────────

Runner.run(

triage_agent,

[

“Could you please provide an update on the delivery timeline ”

“for our recent purchase?”

]

)

Process description:

  1. Initial message → Triage Agent. The user first sends a query to triage_agent.
  2. Intelligent handoff. triage_agent recognizes that the issue is related to “order delivery time” and calls handoff, handing control and session status to order_management_agent.
  3. Full-time processing. order_management_agent After taking over, use your own tools (such as track_order_status) to inquire and reply to the latest logistics progress.
  4. Optional callback. If you need to return to the main process after the task is completed, you can trigger the handoff again in order_management_agent to return control to the triage_agent or other agents, forming a closed loop.

Decentralized division of labor allows each agent to focus on its own domain, reducing the pressure on master control and improving professionalism, which is especially suitable for session splitting scenarios.

Doubts

Many students may be a little confused here, so I will give a brief explanation here:

The decentralized model is like a group of colleagues at the same level working at an open desk – whoever is best at it reaches out first, and after completing the task, they can directly hand over the documents on the table to the next more suitable colleague to continue.

There is no “team leader” who is always staring, and there is no fixed flow chart, and everyone “passes” the work to the most suitable person while doing it.

What is the essential difference from the “manager model”?

The manager mode is an all-round assistant, and the user always faces the same virtual customer service image, and the operating logic is as follows:

User → Manager Agent → Invoke Tool → Professional Agent → Return Result → Manager Agent Integration → Reply to the user

User asked: “Help me check the logistics of order 1234, and then recommend similar products”

The manager agent receives the request

The background calls two tools at the same time:

  1. Tool A: Order Inquiry Agent → to obtain logistics information
  2. Tool B: Product Recommendation Agent → Generate a list of recommendations

The Manager Agent consolidates the two results into a natural language response: Your order is expected to arrive tomorrow. According to the purchase record, we recommend these hot-selling accessories for you: (1)… ②…

The advantages are clear here:

  • Unified experience: Users always feel like they’re talking to the same “person.”
  • Covert collaboration: Users do not need to be aware of the presence of multiple agents in the background
  • Strong controllability: suitable for scenarios where sensitive information needs to be reviewed/filtered (e.g., financial consulting)

The decentralized mode is similar to the department relay, and the user will perceive the switching of the service subject:

User → Triage Agent → Transfer → After-sales Agent → Transfer → Sales Agent → … → Final closed loop

User asked: “How to warrant a broken mobile phone screen?” By the way, look at the new model”

The triage agent identifies a dual requirement → triggers a transfer rule

The first stick: The maintenance customer service agent takes over the conversation: “Please provide the device IMEI code, and I will generate a repair ticket for you…”

After resolving the repair issue, it is automatically triggered: “It has detected that you are concerned about a new product, and you are being referred to a product consultant…”

Second stick: Sales Agent to showcase new products and guide purchases

The product experience here will be different:

  • In-depth service: Each link is provided by the most professional agent to provide the ultimate service
  • Flexible jump: similar to the experience of a hospital’s “triage desk→ specialist → examination department”
  • Reduce complexity: a single agent only needs to be proficient in a specific area (e.g., maintenance agents do not need to know sales strategy)

The logic here is very similar to my previous training PPT:

In a field, it is better to adopt the manager model, but it is more appropriate to jump from the legal field to the medical field to decentralization.

Agent security

Well-designed safeguards can help you manage data privacy risks (e.g., preventing system prompt leaks) and reputational risks (e.g., ensuring that model behavior aligns with brand tone):

  • Layered deployment. Protect against identified risks and add additional protections layer by layer as new vulnerabilities are discovered.
  • Cooperate with safety infrastructure. Safeguards are a critical component in any LLM-based deployment, but they must be paired with robust authentication and authorization protocols, strict access controls, and other standard software security mechanisms.
  • Multiple defensive thinking. Think of protection as a “layered defense” – a single line of defense is often not enough to provide comprehensive protection, and a combination of multiple and specialized lines of defense can make agents more resilient.

The following diagram (omitted here) demonstrates how LLM-level protections, rule-based protections such as regex, and the OpenAI Moderation API can be combined to multi-vet user input:

Types of protective measures

Construct a three-step heuristic for protection:

  1. Focus on data privacy and content security: Prioritize addressing the most important privacy and security risks.
  2. Iteration based on real edge cases: Additional layers of protection are added as new problems are exposed in actual use.
  3. Balancing safety and experience: Continuously optimize protection measures in the process of agent evolution, ensuring both safety and smooth user experience.

Specifically, Guardrails can be implemented as a function or proxy to enforce policies such as:

Humanity is the bottom

Human intervention is a critical safety net that enhances agent performance in real-world environments without sacrificing user experience.

This is especially important early in deployment to help identify failures, identify edge cases, and establish a robust assessment cycle.

Implement human intervention mechanisms to gracefully hand over control when agents are unable to complete their tasks:

  • Customer service scenario: Escalate the issue to human customer service.
  • Coding scenario: Return control to the user.

Typical triggers:

Exceeding failure thresholds – Exceeding the failure threshold

Set limits on the number of retries or operations of agents; If it is exceeded (e.g., multiple times cannot understand the customer’s intent), it is upgraded to manual processing.

High-risk actions — High-risk actions

For sensitive, irreversible, or high-value operations, human oversight should be introduced before full trust in agent reliability.

Examples: Cancel a user order, approve a large refund, execute a payment.

epilogue

Agents are ushering in a new era of workflow automation—systems capable of inference in uncertain scenarios, perform actions across tools, and handle multi-step tasks with a high degree of autonomy.

Unlike simpler LLM applications, intelligent agents can execute the entire process end-to-end, making them ideal for scenarios such as complex decision-making, unstructured data, or fragile rule-based systems.

End of text
 0