YC partner’s latest observation: The dilemma of AI applications is not in model capabilities, but in product design

The dilemma of AI applications lies in product design rather than model capabilities. The article uses Gmail’s Gemini email draft generation feature as an example, pointing out that the tone and style of the drafts it generates do not match the actual needs of users, and users need to spend time writing prompts. The root of the problem is that the Gmail team didn’t take into account the user’s personalization needs and didn’t allow users to edit the system prompts.

In the AI era, many people have a question, what does AI native applications look like?

Not long ago, YC partner Pete Koomen put forward an interesting view: the dilemma of many AI products at present is not that the model ability is not good, but that the application design is not good.

The reason is that these products are still designed based on the product logic of the past without fully considering the actual needs of users.

For example, traditional product development often requires programmers to design system prompts in advance, but these prompts that have been designed early are difficult to truly meet the personalized needs of users in practical applications, and have even become the biggest obstacle to the release of the potential of large models.

It’s like the steam carriages of the 80s of the 19th century, where people only wanted to replace horses with engines as power drives, but did not think about redesigning the vehicles to cope with higher speeds.

In this article, Pete Koomen uses his personal experience to share his understanding of AI-native applications.

01 From Gmail’s new AI features Compared to most AI applications, I prefer to develop software with AI myself.

When I develop software with AI, I can quickly create what I want, but many AI applications don’t feel that way. Their AI capabilities are useless and just mimic the way software was developed in the past. In my opinion, this path dependence limits the true value of AI products.

To better illustrate what I mean, I’ll use Gmail’s AI assistant as an example

How can product managers do a good job in B-end digitalization?
All walks of life have taken advantage of the ride-hail of digital transformation and achieved the rapid development of the industry. Since B-end products are products that provide services for enterprises, how should enterprises ride the digital ride?

View details >

Not long ago, the Gmail team released a new feature that allows users to generate email drafts from scratch using Google’s flagship AI model, Gemini. Here’s what it looks like:

▲Gmail’s Gemini email draft generation function

On the interface, I added a prompt asking me to write an email to my boss. Let’s take a look at the results returned by Gemini:

▲Gmail’s Gemini email draft 1 draft generation function responds

As you can see, Gemini writes very reasonable drafts. But sadly, it doesn’t sound like an email I would actually write at all. If I write this email myself, it should look something like this:

▲ Emails I can write

Tone isn’t the only problem with this email. I also put a lot of effort into writing prompts, even writing more words than I could write my own emails.

This means that I write manuscripts in Gemini far less than I do myself.

This is very unscientific. It stands to reason that such a powerful model as Gemini is fully capable of writing an excellent email. But the Gmail team didn’t.

02 Better email assistant

To illustrate, here’s a simple demo of an AI email assistant that I could save a lot of time if Gmail launched this assistant:

▲Demonstration of a practical email assistant implemented using OpenAI’s GPT-4O-mini

This demo reads emails with AI instead of writing them from scratch. Each email is categorized and prioritized, with some automatically archived and others receiving an auto-generated draft response.

The assistant processes emails one by one based on a custom “system prompt” that explains exactly how I want to handle each email. You can experiment with your own tagging logic by editing the system prompts.

This approach is obviously more powerful, so why doesn’t the Gmail team do it? To answer this question, let’s take a deeper look at the problems present in their design. Let’s start with its universal style.

03 AI Slope

Gmail’s AI assistant generates long, bizarrely formal drafts that don’t look like my style at all.

Everyone who has written with a large model has had this experience, so much so that most of us unconsciously adopt strategies to avoid it when writing. The simplest strategy is to write more detailed instructions to guide the AI in the right direction, like this:

But the problem is that every time I want to write a new email, I have to write something similar.

There’s a simple solution to this problem, but many AI app developers seem to overlook it: let me write my own “system prompts.”

04 System prompts and user prompts

From the outside, large language models are actually very simple. They read a string of words, known as “prompts,” and then begin to predict one by one what might come next, known as “responses.”

It is important to note here that all inputs and outputs are text. The user interface of the large model is also text.

Large model manufacturers such as OpenAI and Anthropic have used a way to simplify prompt writing. They divide the prompts into two parts: system prompts and user prompts. It is so named because in many API applications, the application developer writes the system prompt and the user writes the user prompt.

System prompts explain to the model how to complete a specific set of tasks, which are used repeatedly. User prompts describe the specific tasks that need to be completed.

You can think of system prompts as a function, user prompts as their inputs, and the model’s responses as their outputs:

▲Use GPT-4O-mini to simply demonstrate the system/user prompt relationship

In my original example, the user prompt was:

▲My original user tip

Google keeps the system prompt a secret, but based on the output we can guess what it looks like:

▲Gmail’s email draft writing system prompt (approximately)

The problem isn’t just that the Gmail team wrote a bad system prompt. What’s more, I can’t modify it yet.

If Gmail didn’t force me to use their cookie-cutter system prompts, but let me write my own, it would look like this:

▲Pete system prompt

Visually, you can see what’s going on: when I’m writing my own system prompts, I’m teaching large models to write emails the way I do. Does this work? Let’s try it.

▲According to the system prompt, GPT-4O-mini returns a completely different response to the same user prompt

Try generating a draft using the (envisioned) Gmail system prompt, then do the same with the “Pete system prompt” above. The “Pete” version will show something like this:

▲Email drafts generated using Pete System Prompt

It’s perfect! It’s so simple!

Not only is the output of this draft better, but every subsequent draft will be better because the system prompt will be used repeatedly. No more struggling to explain to Gemini how to write like me!

Take a few minutes to think about how you write an email. Try writing a “Your System Prompt” and see what happens. If the output doesn’t look right, try imagining what you’re missing in your explanation and try again. Repeat several times until you feel that the output is correct.

Better yet, try some other user tips. For example, see if you can get the large model to write these emails in your voice:

▲Personal email user tips

▲Customer support requests user prompts

It feels amazing to teach large models to solve problems with your methods and watch them succeed. Surprisingly, this is actually easier than teaching people, because, unlike people, large models will give you instant and honest feedback on whether your explanation is good enough. If you receive a satisfactory draft email, then your explanation will be sufficient. If you don’t receive it, then it’s not enough.

By exposing system prompts and making them editable, we create a product experience that produces better results and is actually fun. As of now, most AI static apps do not (intentionally) expose their system prompts. Why?

05 Horseless carriage

Whenever a new technology is invented, the first tools built on that technology are bound to fail because they often copy the old way of working, like horseless carriages.

“Horseless carriage” refers to an early automobile design that borrows heavily from previous carriages. Here is a design drawing of an 1803 steam carriage that I found on Wikipedia:

▲Trevithick’s London steam carriage in 1803

The flaws in this design were imperceptible at the time, but became apparent in hindsight.

Imagine living in 1806 and riding such a car for the first time. Even if the wooden frame is strong enough to get you to your destination, the wooden seats and suspension system that lacks suspension can make the process unbearable.

You might be thinking, “I would never choose an engine over a horse.” “At least until the invention of the car, you were right.

I suspect we are experiencing a similar era of AI applications. Many of these apps, like Gmail’s Gemini integration, are useless.

The original horseless carriages were born out of “old-world thinking”, with the core of replacing horses with engines without redesigning the vehicles to handle higher speeds. What is the old world mindset that limits these AI applications?

06 Old world thinking

Now, you want the computer to do one thing, there are two options to achieve it:

Write a program
Use programs written by others

Programming is hard, so most of us choose the second option. That’s why I’d rather spend a few dollars on a ready-made app than develop it myself; It’s also why big companies would rather spend millions of dollars on Salesforce than develop their own CRM.

The modern software industry is built on the assumption that we need developers to act as intermediaries between us and computers. They translate our desires into code and abstract them behind a simple, generic interface that we can understand.

The division of labor is clear: the developer decides how the software behaves in general, and the user provides input to determine how the software behaves in a particular situation.

By splitting the prompt into system prompts and user prompts, we create an analogue that corresponds exactly to these old domains. System prompts control the LLM’s behavior in general, while user prompts are inputs that determine how LLMs behave in specific situations.

In this framework, it is natural to think that writing system prompts is the developer’s job and writing user prompts is the user’s job.

But in the case of Gmail, this doesn’t work. The AI assistant should be writing emails on my behalf and my way, not a one-size-fits-all voice designed by a committee of Google product managers and lawyers.

In the past, I could only accept generic products because it was difficult for me to write my own programs. But in the age of AI, there is no need for a programmer to tell a computer what to do, anyone can write their own system prompts.

That’s what I’m trying to say: when an AI agent does things on my behalf, I should be allowed to teach it how to do it by editing system prompts.

This doesn’t mean I need to write my own system prompts from scratch myself. Gemini should be able to write a draft prompt for me using my email as a reference example.

So, what about less personalized agents like AI accounting agents or AI legal agents? Wouldn’t it be more reasonable for software developers to hire a professional accountant or lawyer to write generic system prompts?

If I’m a user, this might make sense. System prompts to perform X actions should be written by experts in the X field, and I am not an accounting or legal expert. However, I guess most accountants and lawyers also want to write their own system prompts because their expertise is context-specific.

For example, YC’s accounting team operates in a way that is unique to YC. They use a specific combination of in-house software and off-the-shelf software. They use conventions that are unique to YC and can only be understood by other YC employees. The funding structure they manage is also unique to YC. A cookie-cutter accounting agent helps our team like a professional accountant who knows nothing about YC: completely useless.

This is true for every company I have worked for, every accounting team. That’s why so many finance departments still use Excel: it’s a versatile tool that can handle countless specific use cases.

In most AI applications, system prompts should be written and maintained by users, not software developers or even domain experts hired by developers.

Most AI applications should be agent builders, not agents.

07 What do developers need to do without writing prompts?

What do developers need to do if they don’t write prompts?

First, they will create a UI for a build agent (such as an email inbox or general ledger) that runs in a specific domain.

Most people probably don’t want to write every prompt from scratch, and a good proxy builder won’t force them to do so. Developers provide templates and prompts to write agents to help users create their own.

Users also need an interface to see what the agent is doing and iterate on their prompts, similar to the small virtual email proxy builder I mentioned above. This interface provides them with a quick feedback loop for training agents to perform tasks reliably.

The developers will also build agent tools.

Tools are mechanisms for agents to interact with the outside world. My email writer needs a tool to submit drafts for my review. It may use another tool to send an email that I haven’t reviewed, or search my inbox for emails from an email address before, or look at YC’s founder directory to see if the email is from YC’s founder.

Tools provide a layer of security for proxies. An agent’s ability to perform specific actions depends on the tools they have access to. It is much easier to enforce boundaries with tools written in code than with text to enforce boundaries between system prompts and user prompts.

08 Expectations for AI native applications

For many of us, the “killer application” of AI looks like this: teaching computers to do things we don’t like so we can spend our time doing things we love.

In most cases, large models are good enough. What hinders us from achieving wider application of AI is not AI capabilities, but application design.

It’s like the Gmail team built a carriage without horses as they set out to add AI to their existing email client instead of thinking about what it would look like if an email client with AI was designed from scratch.

Their application is to cram AI into an interface designed for everyday human labor, not an interface designed for automated routine labor.

AI-native software should maximize the user’s impact in a specific field. The AI-native email client should minimize the time I spend on email. AI-native accounting software should minimize the time it takes for accountants to keep books.

This is exactly why I am excited about the future of AI. In that world, I don’t need to spend time doing tedious work because the agent will take care of it for me. I just have to focus on what I think is important because the agent takes care of everything else. The work I love can also be more productive because the agent will do it for me.

YC partner’s latest observation: The dilemma of AI applications is not in model capabilities, but in product design

01 From Gmail’s new AI features Compared to most AI applications, I prefer to develop software with AI myself.

02 Better email assistant

03 AI Slope

04 System prompts and user prompts

05 Horseless carriage

06 Old world thinking

07 What do developers need to do without writing prompts?

08 Expectations for AI native applications

JD.com vs. Meituan, Cudi won

Several variables affecting JD.com’s takeaway appeared at the same time

Exceeded expectations! Taobao flash sale opened up nationwide in advance, and joined forces with Ele.me to reverse the takeaway war

JD.com VS Meituan: The final deduction of the “takeaway war”

Why is a Hello bicycle more expensive than a bus?

Xiaohongshu Entertainment live broadcast sprints urgently, appearing in the background in early May, and the voice hall may appear, are you ready?

o3 In-depth Interpretation: OpenAI Finally Uses Tool Use, Is Agent Products Dangerous?

The Truth Behind AI App Hits: From Cursor to Arc, PMF’s Key Insights That Determine Life and Death

In-depth Interview Practical Guide: Say goodbye to awkward chats and superficial information, and dig into user treasures

How does AI programming choose the right large model? 4 stages + 6 recommendations

The era of human-machine symbiosis: four dimensions of AI-driven Internet evolution

Product Thinking: A Leap Journey from Technology to Business

The secret to improving purchase conversion – grasp these six great natures

Build an enterprise-level low-code platform from 0 to 1: the core of interaction design

How to use AI to create overseas localized characters and increase the click-through rate of marketing campaigns?