Analyze the challenges in human-agent communication proposed by Microsoft

Microsoft raises a series of core challenges about human-computer interaction in its latest research: How do we understand the intent of AI? How to build trust? And how to achieve efficient collaboration in an asymmetric information structure? This article will provide an in-depth analysis of the technical and cognitive logic behind these challenges, and take you to re-examine the boundaries of human-computer relationships in the era of “conversational AI”.

The evolution of current agents is no longer simple chatbots but complex systems capable of observing the environment, invoking tools, and communicating with other agents to solve problems, showing great potential in a wide range of tasks.

However, this leap in capability is not without a cost. As Microsoft Research points out in the paper “Challenges in Human-Agent Communication”, their complexity and wide range of failure patterns pose new challenges to human-AI interaction.

This article will provide a clear and easy-to-understand analysis of 12 key challenges in human-agent communication.

The “minefield map” of communication

In “Challenges in Human-Agent Communication”, Microsoft researchers vividly draw a “minefield map” that humans may encounter when communicating with agents.

The framework draws on the concept of “common grounds” in communication theory and summarizes the 12 challenges into three main categories:

1. The problem of universal communication between humans and agents(X1-X4): Communication barriers that run through the interaction between humans and agents are universal.

2. The user transmits information to the agent(U1-U3): The core is how to ensure that AI understands user intentions and needs accurately.

3. The agent transmits information to the user(A1-A5): The core lies in how AI clearly and effectively communicates its state, behavior, and results to users. At the same time, these challenges are also distributed in three different stages of the communication and interaction process: “before”, “during” and “after”.

We have detailed 12 key challenges in the table below:

Next, we will discuss these challenges in depth.

Analysis of 12 major challenges

The Universal Puzzle (X1-X4): The Lingering “Ghost” in AI Interaction

These challenges are prevalent in various human-agent communication scenarios and are fundamental issues that need to be faced when designing any AI interaction system. Together, they form the foundation for building user trust, ensuring transparent interactions, and enabling effective control.

X1: How should agents help users validate their behavior?

Core problem: When agents handle complex tasks, mistakes are inevitable. Therefore, users need an efficient way to confirmWhether the agent understands the instructions accurately, and whether the actions that the agent is performing or plans to perform really meet its expectations.

X2: How should agents communicate consistent behavior?

Core problem: Inconsistencies in AI behavior (or inconsistencies perceived by users) can gradually erode user trust. This inconsistency may stem from complex dynamics generated by the interaction between AI and the environment or other agents, or the AI’s behavior patterns do not match the user’s mental model.

X3: How should the agent choose the appropriate level of detail?

The core question: How to navigate the delicate line between ensuring that users can verify agent behavior, avoiding confusion, and preventing the cognitive burden of users due to too much informationbalance。

X4: What past interactions should agents consider when communicating?

The core question: How can agents effectively use rich historical interaction data to optimize current communication? Ensure that agents canFocus on the parts that are most relevant to the current directive, while effectively managing data that may contain sensitive content and protecting privacy, is a growing challenge.

These universal challenges are pointed to human-computer interaction“Trust-Transparency-Control”This core triangle.

What We Need to Tell AI (U1-U3): Let AI Understand Our “Voice”

These challenges focus on how users can effectively communicate key information such as their intent, preferences, and feedback to agents.

U1: What should agents achieve?

Core Problem: Users need to express their goals and intentions to the AI clearly and without ambiguity. The ambiguity and inaccuracy of natural language can easily lead to misunderstandings by AI, leading to biases in understanding the target. This highlights the presence of humans in the process of transmitting intentions to AI“Semantic Divide”– Human intentions are often subtle, implicit, and context-dependent, while AI can be more literal and limited.

U2: What preferences should agents respect?

Core problem: There are often multiple viable paths or solutions to achieve the same high-level goal. However, some scenarios are significantly more in line with user expectations based on their personalized preferences, specific constraints, or “red lines.” Therefore, the core of the challenge is,How users can clearly and easily express these preferences, especially those that differ from the usual specifications or default settings。

U3: How should agents improve next time?

Core problem: Even if AI initially understands the user’s goals and preferences, it may still make mistakes or perform poorly during execution. Therefore, it is crucial to be able to effectively provide feedback to guide the AI’s behavior, help it learn from its mistakes, and continuously improve its future performance.Human-computer interaction is not only a one-time instruction issuance, but also a continuous, iterative feedback and learning process.

What AI Tells Us (A1-A5): Unveiling AI’s “Heart” and Actions

A1: What can an agent do?

Core problem: If the user does not fully understandThe specific range of abilities of the agent or its inherent limitations, they are unable to make informed decisions about when and how to best utilize their assistance, nor can they establish reasonable expectations when using them.

A2: What will the agent do?

Core problem: To achieve a complex goal, AI may autonomously plan and execute a large number of time-consuming actions step by step. Before performing these actions, especially those that are irreversible, may violate user preferences, or involve higher risks,How and when should AI clearly communicate its action plan to usersto obtain user permission or correction feedback.

A3: What is the agent currently doing?

The core question: How can users be able to do it when AI is in action?Understand in real time what specific actions it is currently performing, what immediate impact those actions will have, and whether users should step in to adjust or suspend their activities if necessary。 The key difference between this challenge and A2 is the timing of communication: A2 focuses on planned communication before action, while A3 focuses on state synchronization during action execution.

A4: Are there any side effects or environmental changes?

The core question: How can users effectively monitor critical changes made by AI to their operating environment, such as local disk files, operating system settings, etc., especially those that may have negative impacts or violate social norms. With the enhancement of the agent’s ability to influence the external environment,AI not only needs to complete tasks but also needs to take responsibility for the consequences of its actions and proactively report these effects to users。

A5: Has the goal been achieved?

Core problem: When a user assigns a high-level, possibly abstract goal to the AI, and the AI executes a series of complex plans to try to achieve that goal,The system needs to effectively convey relevant information to users, so that users can verify whether the goal has been successfully completed based on their own criteria and judgment basis。

Summary of the challenges

Examining these 12 challenges, it becomes clear that they do not exist in isolationIntertwined and mutually influential。 This inherently complex connection means that solutions to these challenges often require a holistic and systemic approach.

Furthermore, although many challenges have long been noted and mentioned in the traditional field of human-computer interaction (HCI) and AI research, the rise of generative AI and tool-based agents has undoubtedly amplified the severity and complexity of these challenges. The “black box” nature of these models, the inherent randomness of the output, and the wide range of capabilities they have make it particularly difficult to achieve transparency and effective two-way communication.

When the scale and nature of the problem have changed fundamentally, some design principles and solutions under the original HCI system may no longer be able to fully meet the current needs, and there is an urgent need for new design patterns and interaction principles, which is the core topic explored by HAI (Human-Agent Interaction).

Notice

In the face of the above challenges, Microsoft Research did not stop at theoretical analysis, but began to build an experimental platform that provides a specific carrier and test field for in-depth study of the above 12 key challenges in a real environment.

In the next article, let’s take a look at how Microsoft tried to solve the problem.

References

GaganBansal，JenniferWortmanVaughan，SaleemaAmershi，EricHorvitz，AdamFourney，HusseinMozannar1，VictorDibia，andDanielS.Weld.“ChallengesinHuman-AgentCommunication”(2024)

Analyze the challenges in human-agent communication proposed by Microsoft

The “minefield map” of communication

Analysis of 12 major challenges

The Universal Puzzle (X1-X4): The Lingering “Ghost” in AI Interaction

What We Need to Tell AI (U1-U3): Let AI Understand Our “Voice”

What AI Tells Us (A1-A5): Unveiling AI’s “Heart” and Actions

Summary of the challenges

Notice

JD.com vs. Meituan, Cudi won

Several variables affecting JD.com’s takeaway appeared at the same time

Exceeded expectations! Taobao flash sale opened up nationwide in advance, and joined forces with Ele.me to reverse the takeaway war

JD.com VS Meituan: The final deduction of the “takeaway war”

Why is a Hello bicycle more expensive than a bus?

Xiaohongshu Entertainment live broadcast sprints urgently, appearing in the background in early May, and the voice hall may appear, are you ready?

o3 In-depth Interpretation: OpenAI Finally Uses Tool Use, Is Agent Products Dangerous?

The Truth Behind AI App Hits: From Cursor to Arc, PMF’s Key Insights That Determine Life and Death

In-depth Interview Practical Guide: Say goodbye to awkward chats and superficial information, and dig into user treasures

How does AI programming choose the right large model? 4 stages + 6 recommendations

【Industry Research】AI Agent Industry Outlook and Its Impact on the Large Water Industry

Some conjectures about JD.com’s new takeaway model

The product manager who I politely declined stepped on the pit that too many people have stepped on

When Feishu meets HarmonyOS – a “two-way evolution” about the future of collaborative office

I thought about it: Why can China’s large model be successfully implemented?