In the field of artificial intelligence, the development path of many products is not pre-planned, but naturally grows through continuous experimentation and iteration. Mike Krieger, Chief Product Officer at Anthropic, made this point in a sharing and delved into the future trends of AI-generated content, the development methodology of AI products, and how AI can be a core driver of product development. He emphasized that most content will be generated by AI in the future, and the key to the success of a product lies in its ability to solve real problems rather than relying solely on the model’s capabilities.
Mike Krieger is the CPO of Anthropic and the founder of Sequoia Capital, where he co-founded Instagram and served as CTO.
He recently shared his experience and philosophy of building AI products, how to use AI, and his thoughts on agents. Very, very worth listening to!
Original link: https://www.youtube.com/watch?v=Js1gU6L1Zi8
Since Z Potentials has been compiled very well, we are based on the depth | Anthropic Chief Product Officer: From Claude to MCP, the best AI products are not planned, but grow spontaneously from the bottom This article has been revised to some extent, and it is shared as follows:
Core Perspectives:
1. In the long term, most content will be generated by AI in the future. So the question “is this AI-generated” will become meaningless. It is worth paying attention to issues such as the source, traceability and citation of content. Ironically, AI may be more helpful in solving these problems.
Agent Xiaotian: Recently, many students have been checking for AI plagiarism in their papers during the graduation season, and then I happened to see a particularly interesting statement: At this stage, determining whether the content is created by AI is more like a kind of cyber era.
2. The best AI products are often not planned, but “grow spontaneously from the bottom”. Many products will only reveal their true potential when they are very close to the model and after in-depth experimentation. Therefore, changing the path of product development is to change from the previous “top-down” to “bottom-up”.
3. Can I generate a first draft with Claude? Is this “cheating”? But now, we’ve started encouraging this usage. Of course, you’ll need to proofread it yourself after writing with AI to make sure it’s accurate and judgmental. But if it saves you two hours and frees up your time for more important things – then why not?
What does a product manager need to do?
In the process of a product from scratch, it is not easy to do a good job in the role of product manager, in addition to the well-known writing requirements, writing requirements, writing requirements, there are many things to do. The product manager is not what you think, but will only ask you for trouble, make a request:
View details >
4. The model instinctively wants to “please” you and is prone to revealing too much; But if it doesn’t say anything, it may become overly conservative. This kind of delicate judgment has not yet been well trained.
The future of AI-generated content is not about distinguishing between truth and falsehood, but about credibility and resonance
Lauren: What do you think is trending in AI-generated content? How do you help users stay in control of their content as you continue to produce content and generate images? For example, if you’re doing a good job with model explainability, how do you empower users to understand or guide systems like Claude?
Mike: Yes, there are some issues that are still worth exploring in the short term, such as watermarking AI-generated content. But in the long run, most content will be AI-generated. So the question “is this AI-generated” will become meaningless. It is worth paying attention to issues such as the source, traceability and citation of content. Ironically, AI may be more helpful in solving these problems.
Interestingly, it reminds me of blockchain. Although it is no longer a hot topic, some of the problems that blockchain has tried to solve are actually easier to achieve through AI in today’s fully digital content generation and dissemination chain.
For example, we used to focus on the provenance of a document, such as whether it is cited or not, whether it is original, which is still important, but it has become easier to track with the help of AI. Therefore, the future focus is no longer on “whether this is AI-generated” but “where does it come from”, “whether the content is credible”, and “whether it can be verified”.
A truly valuable AI product is never planned
Lauren: Interesting, let’s dive into Anthropic. You guys do a great job with products like Artifacts, programming models, MCP protocols, etc. I’m curious, as a COO, what kind of methodology do you have in product creation? How do you make your product more than just a “model package”, but something more valuable than the model itself?
Mike: I have two ideas.
The first is that the standard for determining whether a product is good has changed in the age of Instagram or now – whether you’re solving a real problem. For example, does making a developer tool really help developers do fast, fun, and creative things? If it’s an end-user-facing product, do you really meet their real-world needs? These criteria are still applicable in the AI era.
The second point is that I have to let go of some of my previous habits. At Instagram, we do a three-to-six-month plan that is very “top-down” and step-by-step. But when I talk to peers like OpenAI at Anthropic, I find that the best AI products are often not planned, but “grow spontaneously from the bottom.”
Many products will gradually reveal their true potential only after being very close to the model and conducting in-depth experiments. So I learned to change the path of product development, from “top-down” to “bottom-up”. For example, Artifacts was the initial research prototype, which was later taken over by designers and engineers to optimize and finally enter the commercialization stage. This path, although not easy to control, does bring a lot of surprises.
Lauren: MCP is one of the important products that is starting to be adopted by the entire industry right now, and I’m curious about how it came about, do you have any stories to share with us?
Mike: It’s really interesting to see what MCP came about. Sometimes my job at the company is to do some internal memes, one of which is to make fun of MCP as a “small spark” in the eyes of two engineers when it was first born.
The initial starting point was when we tried to integrate Google Drive and GitHub. We found that while both features are essentially “bringing context into the model”, the internal implementations are completely different. We’re about to do a third integration, and it looks like it’s going to be another new, recast wheel development.
So my usual pattern is: after doing it three times, you can summarize the abstract level and form a standard. And that’s how MCP came about. At the beginning, it didn’t start with the top-level design of “we need to develop a unified protocol”, but the two engineers felt that it was more reasonable to do so, so they started prototyping and iterating.
We put a lot of effort into making this protocol better and more open, hoping that it would not just be something used internally by Anthropic, but would have a real chance to become an industry standard. Now, MCP has begun to be adopted more widely.
Lauren: How did you nurture and develop this product from a bottom-up idea to now expanding?
Mike: The two directions I’m focusing on right now are around MCP. The first is “execution ability”. MCP was originally designed to introduce context, and now we can integrate GitHub, trigger Zapier, and more. But more importantly, in the next stage, we want the model to take the initiative to complete the task. They need to not only “understand” but also “act” to automate workflows.
The second is “collaboration between agents”. We are still in the very early stages of exploration, and it is not even appropriate to establish standards right away. But it’s clear that in the future, different agents will interact with each other, collaborate, and even “hire” other agents to complete tasks. This will form a new AI economic system.
We have already started internal discussions, such as whether there will be a scenario in the future where “your agent hires another agent for you”. These ideas are exciting.
Lauren: You’ve matured in programming and don’t seem to be just “bottom-up experiments”. What do you think about the positioning of these products? What do you think you’ve done right so far?
Mike: I’m still in awe of programming. Many innovations are not determined by “strategy”, but are driven by a few researchers who break through the boundaries. For example, the aforementioned RL (reinforcement learning) exploration develops naturally from specific research.
One thing we have always insisted on: not only staring at the Benchmark score, but more importantly – does the code generated by the model users like to use it? Does it really lead to good results? We will continue to strengthen this.
The term “Vibe Coding” is not actually proposed by us, but it does have some value. When you use a model to generate code, you may feel a certain “atmosphere” or “inspiration”, which is interesting in small projects.
But if you want to build a large code base and an engineering system with a team of 100 people, this method is not enough. We’re exploring where generative AI is positioned throughout the development process. For example, more than 70% of our pull requests are now generated by Claude code.
But it also brings a new question: how do code reviews work? You can use Claude to review the code generated by Claude, but it’s like a “matryoshka” – every layer is still AI. So how do we keep the technical architecture controllable? Will it enter a dead end in technical bonds? We are still exploring these problems, and I believe that the entire industry is exploring.
One of the biggest changes we feel internally is that after AI has dramatically improved engineering efficiency, the inefficiencies of “non-engineering links” in the organization have become more conspicuous. For example, an alignment meeting that used to delay an engineer by only one hour may now be equivalent to delaying “8 hours of AI output.”
You’ll find that the “bottlenecks” in your organization are not optimized by AI, but are amplified. This leads to more noticeable and painful inconsistencies in the product flow. While models can summarize meetings and suggest next steps, they don’t really help us make organization-wide decisions right now.
From tools to collaboration: How organizations are adapting to the efficiency reimasterning of the AI era
Lauren: You mentioned that Anthropic uses Claude extensively internally. Can you share some usage methods that you think are particularly worth promoting? Are there any uses you’ve tried in the past six months that you think others should try as well?
Mike: My favorite thing to see is that non-technical teams are starting to actively use large models. For example, the sales team uses Claude to prepare for customer meetings. They start with a public version, but when they encounter specific obstacles, we develop their own tools based on their needs. This demand-driven approach is very effective.
But frankly, even in AI labs like ours, the ability to use AI is unevenly distributed. some employees are very skilled and solve problems efficiently; And some people are still stuck in the traditional process. I myself see Claude as a “thinking partner”.
Whether it’s writing a strategy document, creating a plan, or writing a performance review, I’m used to doing a round of brainstorming with Claude first. Just like with Copilot, I feel like I can’t write code without it on a plane, and it’s hard for me to return to writing without AI assistance.
Over the past year and a half, I’ve seen firsthand the culture change within Anthropic. At first, many people hesitate to write performance reviews, work summaries: Can I use Claude to generate a first draft? Is this “cheating”? But now, we’ve started encouraging this usage.
Of course, you’ll need to proofread it yourself after writing with AI to make sure it’s accurate and judgmental. But if it saves you two hours and frees up your time for more important things – then why not? We have an internal tool that runs across Slack and all internal documents. It supports both public and private channels, but most people prefer to use the “public version” because it means that their process of using AI is visible.
Interestingly, during performance season, many employees started using this tool to generate first drafts of comments – and on public channels! This “shared use” has instead helped break the “shame of AI use”. This reminds me of the days when Midjourney was just on the rise, when everyone was willing to publicly display their AI-generated images. This “visibility” is critical to driving AI into everyday work.
We are still far from the stage where AI is fully adopted, but we can see that the culture is shifting in this direction.
AI Agents are becoming the next generation of “digital employees”
Lauren: What are your next priorities? We’ve seen you do a lot in code, enterprise scenarios, and we’ve heard about new model releases. Can you tell us a little about your future plans?
Mike: Our goal for models and products can be summed up in one word: Agent. I know a lot of people are talking about this concept these days. What we want to do is provide underlying support for this new form.
The code is just a starting point, and it shows the rudiments of a broader topic: Can the model work continuously for hours or even longer? That is almost our long-term goal.
To achieve this, the model not only needs to be more powerful, but also needs a complete set of supporting systems: 1. Memory ability (let the model remember what it has done) 2. Advanced tool calls (not just search, but also use complex tools) 3. Automatic adaptation to organizational structure (know what to do after entering the enterprise) 4. Verifiability and logging (for example, a company has 100 Agents running, how to supervise?) We don’t plan to do every link in this ecosystem, but we want our model to be the building block.
Lauren: So the new model is coming?
Mike: There are always new models on the way. Updates in this space are just too fast – but we’ll have some cool new things coming out soon, so stay tuned.
Audience question session
Audience question: As a product owner, what is your biggest headache now?
Mike: The biggest problem for us is that AI products are still too difficult for newcomers to use. We do design some valuable workflows, but they still require users to “do it right.”
As long as the path is slightly off-line, the effect will be greatly reduced. It’s not like when you open Instagram for the first time and know it’s time to take a photo and post. AI products are far from being “out of the box”. Of course, this is related to our current focus on “work scenes” rather than “daily entertainment”. But I often think that the capabilities of the model are already very strong, but there are still too few users who can actually use it well, and the potential is far from being released.
Audience question: What do you think of a recent hot article AI 2027 (about AI’s future route prediction) that proposes that models will be “delayed release” in order to make the most of the profits and resources they bring?
Mike: There are two things I particularly agree with about this article. The first is the importance of computing power. This topic is not new, but it is indeed a core issue for every AI company. We discuss every day: What is our current computing power reserve? What chip will be used in the next generation? Who to work with? These discussions are almost consistent with those mentioned in the article.
The second point is whether to “deliberately delay the release of the model” to maximize returns. This controversy is interesting. For example, Zuckerberg recently mentioned in an interview the trade-off of opening up APIs for LLaMA: Do you want to spend computing power on users, or continue to enhance RL training? This is a choice that every lab is facing. We also need to consider – should we allocate computing power to a profitable large model product, or reserve it for those “crazy new ideas that are still in their infancy”? The latter may give birth to the next generation of architectural breakthroughs. This is not an easy balancing act.
I personally prefer to get the model to the real market as early as possible. The reason why the Claude 3.5 series is so good is because we iterate quickly from actual user feedback. If we had only closed the development in the lab, we might not have come to this point.
Audience question: How do you balance the two in a large organization that does both research and product? Is it the product that defines the research direction? Or does research determine product capabilities, and then the product is contacted?
Mike: I often ask the product team to think: If we make a product that just wraps an API model and has similar functionality to someone else – what are we doing? We have a group of the world’s top researchers, and if a product doesn’t use their results to the fullest, it’s a waste.
One positive case is Artifacts: it’s a product that has been fine-tuned specifically for Claude and works very well. But we have also experienced a period of disconnection between product and research, and we have not really “loaded” model capabilities into the product. We are returning to re-emphasize “product = model capability + delivery method”.
We don’t collaborate enough on this at the moment, and only about 10% of researchers are involved in the product. But we also know that making the model better at executing instructions is actually a positive help for all products, and we are still investing in this kind of basic research. We are also observing some of OpenAI’s practices, such as they may make a special fine-tuning version of ChatGPT, although everyone mainly uses it through the Chat interface, but there may be different models running behind it. We are not doing this at the moment, which may limit the implementation of some differentiated experiences while saving computing power.
Audience question: What do you think about the future standardization of communication protocols between agents? Will Anthropic set similar standards?
Mike: I don’t think anyone has really solved one of the key questions right now – whether and how much agents should disclose? For example, if your agent is dealing with a supplier, you can reveal credit card information. But if it’s just interacting with an unfamiliar agent, it’s time to keep it private. This judgment of “what to reveal and what to hide” is not only a product design problem, but also an unsolved research topic.
The model instinctively wants to “please” you, and it is easy to reveal too much; But if it doesn’t say anything, it may become overly conservative. This kind of delicate judgment has not yet been well trained.
Another challenge: how to be censorable when deployed at scale. For example, if a company deploys 100 agents, how do you record their behavior? How do I set permissions? Even – should these agents have a “name”? We’re still thinking about these questions, some more like research questions, others about upcoming product challenges.
Audience question: What do you think are the most common mistakes people who are working on AI application layer products now?
Mike: I don’t want to say “wrong”, but I observe a common phenomenon: many AI products start with “light AI” and then gradually become “heavy AI”. But in this process, they often just put AI functions in the sidebar of the product, becoming a secondary entrance, and the experience is relatively fragmented.
As product functions become more reliant on AI, this structure will hold back. So the question is not that AI is not strong, but whether you are willing to rebuild the product from the bottom up and make AI the “first user”.
Another common problem is that the app doesn’t expose enough “how-to” for the model to use. For example, when you ask the model to do something for you, it says “I can’t do it”, but in fact you haven’t designed the interface to be able to call these functions. It’s essentially a design problem: you build a GUI and then paste the AI on it. But in fact, you should first consider how AI will use it and make AI the “primary user” of your product.