How does AI programming choose the right large model? 4 stages + 6 recommendations

When AI programming, different development stages have different requirements for models. This article sorts out the key points of the four stages of selection from conception to release, and provides six practical suggestions to help you choose efficiently.

Have you also encountered such troubles? I want to use AI to help you write code, develop an app or website, and find that there are so many AI models on the market (GPT, Claude, Gemini, DeepSeek…… I was dazzled and didn’t know which one to choose. Try any one, and it feels like sometimes it’s easy to use, sometimes it’s “stupid”, answering questions, and even “forgetting” what you said to it before.

For example, I myself have been using Claude 3.7 before, but the last few projects found that it didn’t work again, and some operations that failed several times made Gemini 2.5 a success in one go.

From a competitive point of view, the large models of different companies will not collide with each other, but will be optimized from different angles, which means that each large model has its own strengths. So there is today’s theme: How to choose the right large model when AI programming?

How can product managers do a good job in B-end digitalization?
All walks of life have taken advantage of the ride-hail of digital transformation and achieved the rapid development of the industry. Since B-end products are products that provide services for enterprises, how should enterprises ride the digital ride?

View details >

Do you also experience these troubles?

Imagine you want to develop a simple “recipe lookup” app.

1. Initial Ideation: You ask AI (for example, to choose a model that is good at code generation) to help you plan the core functions, target users, and design style of the app. As a result, the advice it gives is very vague and even a bit off-topic, as it may not be very good at “brainstorming” and understanding business needs.

2. Write Code: Replace the model that is said to be “smart” but expensive and let it write specific code. It does write out, but it’s a bit slow, and for some simple repetitive code, it feels a bit “overkill” and the wallet is “bleeding”.

3. Review Check: Finally, you want the AI to help you see if there are any logical problems with the code of the entire project, or help you write instructions for using it. As a result, the AI is “fragmented” when it sees half of it, because the length of content it can “remember” (i.e., the “context window”) is limited and cannot understand the code of your entire project at once.

Does it feel like none of the models are perfect? That’s right! The key is that no AI model can do the best in all aspects of development. The best strategy is to choose the most suitable AI model at different stages of development or for different tasks. It’s like decorating a house, laying walls, painting, connecting wires, you need to use different tools instead of just a hammer.

Solution: How to choose AI models at different stages of development?

Let’s take developing an app (such as the “recipe query” app just now) as an example to see how to select and use AI models at different stages:

Stage 1: Idea Ideation and Design (Clarify “What to Do”)

Your goal: Determine the core functions of the app (such as recipe search, categorization, collections, user reviews), design style, who the target users are, what pages are needed, etc.

What AI is needed: AI needs to have strong logical reasoning skills and extensive knowledge to help you brainstorm, understand your ideas, and give structured suggestions.

Model recommendation:

Google Gemini 2.5 Pro: Strong reasoning and a huge “memory” (context window) to understand complex ideas and needs.
Anthropic Claude 3.7 Opus (if available and budgetable): Generally considered top-notch reasoning and comprehension skills.
OpenAI o1 (GPT-4.5): Also known for its strong reasoning capabilities.
DeepSeek R1 (671B): Excellent in planning and reasoning, cost-effective.

Cost considerations: This stage is to lay the foundation, and a good plan can avoid a lot of modifications later. It is often worth investing in choosing a more powerful model here, which can save you more time and costs in the future.

Stage 2: Hands-on coding and implementation (turning ideas into code)

Your goal: Implement the designed functions line by line with code to build the interface and logic of the APP.

What AI is needed: AI needs to be good at understanding and generating code, giving code suggestions, explaining code meaning, and fixing simple bugs.

Model recommendation:

Anthropic Claude 3.7 Sonnet: Considered by many developers to be excellent in terms of code generation quality and following instructions, especially when paired with development tools like Cline.
OpenAI GPT-4o: A strong all-rounder with strong coding skills.
DeepSeek V3: The code implementation is close to Sonnet’s, and the cost performance is very high, making it suitable for daily coding work.
Google Gemini 2.5 Pro: With its powerful synthesis capabilities and vast context window, it is also advantageous when dealing with complex codebases.

Cost considerations: For simple code completion on a daily basis or for less complex modules, consider using more cost-effective models such as Claude 3.7 Haiku or DeepSeek V3. Leave more expensive and powerful models like Claude 3.7 Sonnet or GPT-4o to those complex, core feature development.

Stage 3: Testing and fixing bugs (make the APP run without errors)

Your goal: Find out various bugs that may exist in the app, such as button clicks not responding, data display errors, etc., and fix them.

What AI needs: AI needs to understand code logic, identify possible edge cases, and help write test code or give fix suggestions.

Model recommendation:

Anthropic Claude 3.7 (Sonnet or Haiku): Sonnet excels in understanding complex logic, and Haiku may be sufficient for simple test case generation with fast and low cost.
OpenAI GPT-4o (or its Mini version): Also has good code understanding and generation capabilities, capable of handling testing tasks.

Cost considerations: Test code usually has a fixed pattern, and in many cases, a mid-tier model is sufficient. For complex test scenarios with core functionality, consider using a more robust model.

Stage 4: Code Review and Release Preparation (Final Inspection and Refinement)

Your goal: Before the app goes live, review all the code as a whole to ensure that the code style is uniform, there are no obvious logic loopholes, and some user documentation or instructions may need to be written.

What AI is needed: AI is needed to handle large amounts of code and understand the structure of the entire project. This is where the “memory” of the model (context window size) is very important. If AI can “read” all your code at once, it will be much more efficient. Some models can also understand images (multimodal capabilities) and can help you check UI screenshots or design drawings.

Model recommendation:

Google Gemini 2.5 Pro: With the current leading extra-large context window (up to 2 million tokens), ideal for handling the review and understanding of large codebases.
Anthropic Claude 3.7 Sonnet: Also has a large context window (200K tokens), suitable for most project reviews.
OpenAI GPT-4o: The context window is also relatively large and has multimodal capabilities.

Cost Considerations: While models with large context windows are generally more expensive, they can process more information in one go and avoid repeated input and interpretation, saving you time, especially during the post-review phase of a project, which is often worth it.

Practical advice for novice developers:

Understanding the “Context Window”: This is like the AI’s “short-term memory” (similar to computer memory). It determines how much information the AI can process at once (your code, your questions, its answers). If your project is large or the conversation is long, beyond this limit, the AI may “forget” the previous content. Note that the context window size provided by the model (in tokens, which can be roughly understood as words or character blocks), for example, Gemini 2.5 Pro is up to 2 million, while Claude 3.7 Sonnet is 200,000.
Start with “enough is good”: you don’t have to use the most expensive and powerful model. You can try the cost-effective mid-range models (such as Claude 3.7 Haiku, DeepSeek V3, Gemini Flash series) first, and then upgrade to a more powerful model if you find that the capabilities are not enough.
Division of labor (if the tool supports it): Some AI programming tools (like the Cline mentioned above) allow you to set up different models for “planning” and “execution”. You can use a model that is good at thinking (such as Gemini 2.5 Pro, DeepSeek R1) for planning, and then use a model that writes code quickly and well (such as Claude 3.7 Sonnet, DeepSeek V3) to write specific code.
Try more to find your “best partner”: Model rankings and other people’s recommendations can be referenced, but in the end, which model is best for you, you still need to try it yourself. You can experiment more on less important tasks or personal projects.
Focus on actual results rather than pure benchmarks: A model’s benchmark score is just a reference, and its performance in actual use (such as how well it works with the tools you use) is more important.
Don’t think about on-premises models just yet: While running models on your own computer may sound cheap, the current performance and reliability of on-premises models (especially when it comes to performing complex tasks and using tools) is far inferior to cloud models, which can give you more headaches.

How does AI programming choose the right large model? 4 stages + 6 recommendations

Do you also experience these troubles?

Solution: How to choose AI models at different stages of development?

Stage 1: Idea Ideation and Design (Clarify “What to Do”)

Stage 2: Hands-on coding and implementation (turning ideas into code)

Stage 3: Testing and fixing bugs (make the APP run without errors)

Stage 4: Code Review and Release Preparation (Final Inspection and Refinement)

JD.com vs. Meituan, Cudi won

Several variables affecting JD.com’s takeaway appeared at the same time

Exceeded expectations! Taobao flash sale opened up nationwide in advance, and joined forces with Ele.me to reverse the takeaway war

JD.com VS Meituan: The final deduction of the “takeaway war”

Why is a Hello bicycle more expensive than a bus?

Xiaohongshu Entertainment live broadcast sprints urgently, appearing in the background in early May, and the voice hall may appear, are you ready?

o3 In-depth Interpretation: OpenAI Finally Uses Tool Use, Is Agent Products Dangerous?

The Truth Behind AI App Hits: From Cursor to Arc, PMF’s Key Insights That Determine Life and Death

In-depth Interview Practical Guide: Say goodbye to awkward chats and superficial information, and dig into user treasures

How does AI programming choose the right large model? 4 stages + 6 recommendations

Operation and monetization analysis of the 7 major tracks of AI self-media

Component e-commerce traffic operation 10,000-word overview

After Cursor, Devin, and Claude Code, another AI coding dark horse is rapidly emerging

Next: B-end products must be seen: analysis from demand to product plan after receiving unfamiliar business, taking chain retail enterprises as an example

Community operation: stop the “participation and co-creation” gimmick and solve the real problem!