Perplexity CEO Latest Insights: From search to execution, inference models are taking over the next stop in generative AI

When large models “can write and speak” have become standard, Perplexity CEO pointed out that the next battle is “can do and execute”. The article reveals that the industry is shifting from the era of pre-training heap parameters to system engineering with inference models as the core – allowing AI to plan, act, and obtain results like an agent. The training data has changed from “text” to “task trajectory”, and the business path has changed from “selling APIs” to “selling closed loops”.

Inference models are gradually taking over the pre-training paradigm and becoming a key turning point for generative systems to enter the deployment stage. Aravind Srinivas, co-founder and CEO of Perplexity, pointed out in a Harvard conversation that the current industry focus has shifted from expanding the scale of model parameters and corpus to building system architectures with execution and feedback mechanisms.

He said that this paradigm shift is not only related to the improvement of the capabilities of the model ontology, but also affects the reconstruction of the entire AI engineering process, from data collection and user feedback to task scheduling and system integration, one by one into the adaptation track of the reasoning paradigm.

The team led by Srinivas is embedding language models as system artifacts into search Q&A scenarios, continuously iterating on its agent capabilities and deployment logic around real-world usage paths. The core of its judgment is that the general pre-trained model builds the basis of language understanding, but cannot be directly transformed into a usable system, and can only be implemented into a product form with the ability to perform autonomous tasks with the help of reasoning mechanisms and behavioral feedback. In this context, the training paradigm is moving from token prediction to behavior planning, and the goal of the model is no longer to imitate language but to solve tasks.

Inference model systems represented by systems such as Perplexity and DeepSeek are forming a new product architecture consensus: replacing text fitting with structured task paths, replacing offline evaluation with real feedback closed-loop, and deconstructing model boundaries at the system engineering layer.

This transformation has also prompted the industrial chain to reconstruct the resource allocation strategy, from UI packaging first, open source model testing, to behavior path verification before training and deployment, truly establishing a closed-loop mechanism of “task capability, system structure, and resource decision-making”.

What does a product manager need to do?
In the process of a product from scratch, it is not easy to do a good job in the role of product manager, in addition to the well-known writing requirements, writing requirements, writing requirements, there are many things to do. The product manager is not what you think, but will only ask you for trouble, make a request:

View details >

From the perspective of the promotion path of global system architecture, the problem faced by Srinivas is a typical epitome of the entire AI engineering entering the execution era – how to promote the construction of phased systems with deployment significance under the condition that model capabilities are not clearly defined and the feedback mechanism is not closed.

The inference model takes over the pre-training paradigm

In the past two years, generative pre-trained models have made significant breakthroughs in language understanding and generation, but this paradigm has reached the stage boundary. The focus of the industry is shifting from expanding the scale of corpus and parameters to improving the system’s execution capabilities and task reasoning depth. The new generation of models will rely more on structured tuning in the post-training stage to support the processing of complex logical chains, the execution of task processes, and behavioral operations in the web environment. This trend has become the main research direction of head model laboratories around the world.

Pre-training provides the model with a low-level grasp of the world’s common sense and semantic structure, but if you want to build an intelligent system with real practical value, you still need to achieve capability refinement and structure implementation in vertical task scenarios. Systems retrained around real-world usage paths, such as Perplexity, are being retrained to achieve continuous value delivery at the product level. At the same time, the rapid evolution of China’s open source system has also formed a traction on the global rhythm, and the emergence of DeepSeek has become one of the significant benchmarking pressures for North American teams.

DeepSeek’s breakthrough is not only reflected in its engineering capabilities – including system compilation, floating-point optimization, kernel scheduling, and the ability to deploy large models on low-end GPUs, but more importantly, it proposes and implements the concrete path of “inference models”. Its release of DeepSeek Zero demonstrates that reinforcement learning guides models to generate executive reasoning behaviors in an unsupervised environment, providing a structural template for the training mechanism of automated agents and opening a new channel for the industry to explore the boundaries of capabilities.

In the path of collaborative promotion of products and research, some teams have introduced structured mechanisms to focus cutting-edge research on model task capabilities and system performance optimization, while the product side focuses on interface design, information organization and user experience, and uses Q&A search to integrate scenarios to test expression strategies and reasoning processes. This “double helix” mechanism ensures that each iteration has a clear basis for experimental verification, forming a stable feedback and update rhythm.

At the resource scheduling level, leading enterprises directly transform system feedback signals into computing resource decisions based on their understanding of model mechanisms: when small-scale inference agent experiments are positively verified, they quickly scale up the deployment scale and directly purchase 10,000 GPUs to build a complete inference system. The logic behind this is based on a deep understanding of the strong correlation between AI model performance and economic returns.

At the same time, a “delayed training” strategy has also been proven to be effective within some companies: initially build prototypes through UI packaging, obtain user data and behavioral feedback first, and then choose the opportunity to start large model training, and use the performance progress of open-source models to bridge the initial resource limitations. In 2023, this strategy was validated in several projects and has been incorporated into the main process of product design, technology investment and capital allocation.

A task-path-led data refactoring paradigm

The focus of model training is shifting from large-scale corpus scraping to the construction of concrete task paths. Under the task-oriented training paradigm, the improvement of model capabilities no longer relies on replicating human language expressions, but focuses on executing chained behaviors, including mathematical reasoning, code generation, web page clicks, file processing, and other specific operations. The training samples are organized in units of “task behavior paths”, forming the key foundation of the agent’s system reasoning ability.

This shift has also brought about deep changes in data sources and training goals. Enterprises generally abandon self-built pre-trained models and hand over the construction of general semantics to open source communities and closed-source laboratories, while focusing on building a closed fine-tuning system on this basis. Tuning tasks focus on structured generation capabilities and systematic operational processes, such as modular skills such as summary extraction, format conversion, document rewriting, and upload execution, with the aim of creating deployable and evaluable intelligent components.

In terms of training data compliance, while disputes over copyright and generated content remain, such as the ongoing lawsuit filed by the New York Times, industry practice has gradually developed a consensus that as long as the output does not significantly reproduce the original content, it can be considered “fair use”. To this end, most enterprises adopt strategies such as isolating corpora, converting output formats, and emphasizing task-oriented strategies to reduce potential risks and actively avoid semantic replication paths.

The model data source has also been comprehensively reconstructed in terms of mechanism. The query behavior and interactive feedback of real users become the core ability training signals, and behaviors such as likes, modifications, and clicks are systematically collected to guide the model sorting and enhancement process. At the same time, the system also dynamically adjusts the crawling strategy of information sources based on historical answer performance – giving priority to increasing the crawling frequency and index depth of high-value content, forming a feedback-driven data supply mechanism.

Manual assessment is still an irreplaceable part of the training process. The common process is to present two model outputs side by side, which are manually judged and further used to train the ranking model or annotate the quality of the sample.

In parallel with this is the systematic introduction of synthetic data mechanisms. In the training process, the large model has assumed the role of “teacher model”, scoring, structuring, or behavior classification of the output of the small model to generate a small sample dataset for fine-calling. This mechanism is especially effective in tasks such as building UI classifiers. Taking user intent recognition as an example, the large model automatically identifies query categories such as finance, tourism, and shopping and returns the annotation, and then imitates learning by the small model to form a self-supervised closed loop.

This path not only improves data production efficiency, but also lays the training foundation for system capabilities such as user intent understanding, retrieval path planning, and response behavior diversion, and becomes a key technical foundation to support the inference agent architecture.

Search alternative paths and system resource refactoring

Computing power resources have become the core constraint of the current AI system expansion capabilities. The training of early basic models is highly dependent on ultra-large-scale computing power, and even if there is methodological innovation, it is often difficult to form industrial influence without engineering scheduling and resource back-end. Compared with academic institutions, platform-based technology enterprises have stronger integration capabilities in computing power organization, system engineering and product deployment, which also attracts a large number of research talents to migrate from laboratories to the industrial side in search of high-execution landing platforms.

Although the underlying large model training is still dominated by a small number of resource-intensive teams, the system architecture design on top of the model abstraction layer still maintains broad room for innovation. From agent framework construction, task evaluation mechanism, contextual protocol standardization, to simulation environment design and multi-module collaboration strategy, the value of the system depends more on structural efficiency than parameter scale. This layer of research does not rely on extreme computing power, and is more suitable for establishing a long-term cooperation path between industry and academia.

In the face of the stock advantage of search engine giants, the new system generally chooses to avoid the head-on collision of computing power and turn to the path of heterogeneous mechanisms. The basic judgment is that once a large platform deploys a generative system to the global entrance, the system load caused by the query volume will be non-linearly amplified, resulting in a structural imbalance in infrastructure costs. At the same time, the high-brand premium platform has very little room for error in the face of incorrect generation results, and the content security mechanism cannot effectively close the loop, and its strategy iteration frequency is limited, further weakening the system update ability.

The deeper misalignment comes from the business model itself. Traditional search platforms rely on click-oriented ad monetization paths, and it is difficult to directly map the behavior mechanism of CPC models and generative Q&A systems. The generated content does not have standardized jump goals and conversion paths, and the ROI of advertising delivery is difficult to measure, and the advertising budget is gradually shifted to more controllable channels. At the same time, search advertising has the advantages of high gross profit and low marginal cost, while the deployment and operation costs of generative systems are high, resulting in a significantly inferior unit revenue ratio and a structural business gap.

It is this misalignment of paths and structures that opens a window of opportunity for emerging systems. Compared with large platforms that reconstruct business logic at every turn, light structure teams can skip existing dependencies and directly build a rapid closed loop of “technology-product-business”. A high-speed feedback mechanism is established between technical experiments and commercial paths, so that the Q&A search fusion system has realistic alternative potential.

Some teams adopt the “use first and then train” strategy, that is, build a system framework with an open source model in the early stage, obtain user interaction and behavior data, and then turn to a self-developed model system after the system structure is stable. This path significantly reduces early capital consumption while building on forward-looking judgments on the evolution of open source capabilities. As the open source model approaches the upper limit of closed-source performance, the feasibility and practicability of engineering substitution have been gradually verified.

The revenue structure of the search system is still in the reconstruction period, and the user’s click path has not been stably reconstructed, and there is a significant gap between the per capita monetization efficiency of the AI system and traditional search. Whether it is a subscription model such as Gemini or a preview generation system nested in the search portal, the current commercialization capabilities are not yet mature support for advertising systems. Search structural change is still in the early window period, and this stage is the key cycle of experimenting with new paths.

Anthropomorphic misuse and educational structure reconstruction

The actual use path of generative AI is systematically deviating from its original design goals. Since the Eliza chat program, users have tended to think of language systems as “human-like beings” capable of understanding and interacting with emotions, even if the underlying logic is based entirely on statistics and predictions. Although contemporary large models are clearly positioned as “conversational search” or task-based assistants, users still frequently build role-playing interactive scenarios, and anthropomorphic usage patterns continue to grow in multiple platforms, making it difficult to completely circumvent them by interface design or output constraints alone.

The prevalence of this misuse also raises concerns about the ethical boundaries of the system. Generative systems have been used unexpectedly in highly private scenarios such as marriage and medical care, and even if the system does not directly give suggestions, content presentation or path guidance itself constitutes an intervention in the decision-making process. Some teams try to limit system role positioning with “reference-driven question answering”, but user misuse is still widespread under the framework of using inertia and anthropomorphic understanding.

This trend is particularly evident in individual cases. A character-based AI product has been embroiled in controversy in a real-life incident: a young user used the system frequently before ending his life, and although the system responsibility is difficult to define, the immersive interaction model has raised widespread concerns about “emotional interface dependence”. Although the product is designed to avoid simulating emotional responses, users still see it as an emotional substitute. Some development teams have begun to return to the product philosophy of “behavior-oriented and tool-oriented” as the core, trying to replace personality simulation with functional boundaries, which has become a new round of design consensus.

In the underage user group, the complexity of risk is further amplified. The ability of child users to bypass system restrictions is often underestimated, such as circumventing semantic recognition through multilingual mixing, or guiding models to generate sensitive content through segmented prompts. At present, the industry still lacks a unified content review mechanism, and protection strategies such as “interactive whitelist” and “content frequency interception” are still in the experimental stage, but the demand for supervision and risk control is getting closer and closer.

At the same time, the education system is undergoing a structural shift triggered by generative AI. Not only do teaching methods need to be reconstructed around the personalized capabilities provided by AI Agents, but more importantly, the educational goals themselves are migrating. In the context of extremely high information availability, the traditional education model with knowledge indoctrination as the core has gradually failed, and “problem definition” and “judgment standard construction” have become the core outputs of the teaching system.

Task design is shifting from repetitive exercises and templated answers to structural thinking and exploration-oriented. The role of teachers is also changing from knowledge graders to learning path stimulators, and the system should be designed around “asking questions that AI cannot directly solve”, allowing students to build a knowledge structure with explanatory power and aesthetic tension in the process of proposing, verifying, and correcting problems.

This is accompanied by a demand for expressiveness and structured cognition. From mathematical models to ethical issues, it is often not the difficulty of knowledge itself that really motivates learning, but the complexity and beauty of its presentation. “How to organize complex information and express cognitive tension” is becoming one of the most scarce learning abilities in the future.

The underlying logic of the educational structure is also migrating: more and more undergraduates have begun to take on the open tasks of the original graduate level, and the education system is changing from “imparting knowledge” to “arousing ability”. In the face of the popularization of AI tools, the independent value of education will be determined by whether it can empower students with structural cognition and judgment, rather than the mastery of knowledge points itself.

Capacity closed-loop bottlenecks diverge from AGI pathways

The industry has formed a structural disagreement regarding the definition of AGI’s capabilities and path selection, and this controversy is no longer limited to the academic level, but directly affects the fundamental judgment of enterprises in system architecture and product strategy. While generative AI has demonstrated initial execution capabilities across multiple verticals, there are still critical breakpoints to achieve systems that are versatile and autonomous in decision-making. The real challenge is not a single breakthrough of a certain capability, but whether a complete closed loop of the four links of “task understanding, plan generation, action execution, and feedback evaluation” can be built.

This break is manifested in product practice: even if the underlying model has been updated, such as GPT-4 being replaced by the O-series, users are still generally stuck in the performance perception under the label of the old version, and lack understanding of terms such as “inference model” and “O3”. This means that the actual transition of system capabilities is blocked by the front-end experience, and the model update value cannot penetrate to the user side, resulting in structural obscuration of “invisible capabilities” in the product path.

The developers of the basic model are reconstructing ecological control through a platform-based path, that is, mastering the closed loop of model ontology, user interface and data feedback at the same time, forming an autonomous cycle from behavior collection to capability evolution. This “model-as-platform” structure strengthens data sovereignty and tuning capabilities, and exposes companies that rely solely on APIs to the risk of commoditization and value chain spillover.

In this context, the feasibility of open source models is reassessed. Projects represented by DeepSeek achieve capability approximation under non-limit computing power conditions through structural innovation and inference mechanism construction, breaking the old perception that “open source can only make lightweight models”. At present, some open source systems have independent value in deployment efficiency, capability presentation and module architecture, and have become variables with strategic choice significance in the industrial chain.

At the same time, the interface boundary between the model system and the external software environment remains blurred. At present, there is still a lack of unified protocol to achieve the smooth connection between the model and desktop software, Web App, and third-party services, and the call permissions, context encapsulation, and behavior feedback standards have not been unified, which has become the structural focus of the game between platforms. Who controls the final execution path determines the distribution mode of traffic and revenue, and also directly affects the platform’s attitude towards the agent system.

For example, platforms that rely on ad monetization, such as Amazon and Instacart, usually exercise restraint towards external agent systems to avoid agents bypassing the front-end to complete transactions and disrupt their recommendation systems and ad pricing models. Pay-per-view platforms such as Uber are more receptive to agent embedding, even considering it as an incremental traffic channel. Whether a platform is allowed to be “proxied” or “encapsulated” essentially depends on its business structure and revenue distribution model.

In the process of system architecture evolution, modular abstraction granularity has become a core variable in design strategy. Early inference systems generally adopted explicit module division – sorting, retrieval, summarization and other components were called independently, and some products even identified module functions through role naming (such as Sir Johnny, Mother Dinosaur). However, with the increase of complexity and the increase of O&M pressure, the system structure is tilted towards the integration of scheduling and scheduling, and the master model assumes more task distribution and logical judgment functions, pursuing path convergence and stability.

The granularity of the module design reflects the team’s understanding of the relationship between maintainability, task complexity, and system resilience. Excessive component division can easily lead to collaboration bottlenecks such as interface instability and blurred boundaries, while too coarse division weakens the system’s adaptation flexibility and function reuse ability. This strategy cannot be solved by a common template, and relies more on the team’s engineering judgment and system intuition.

From the perspective of ability judgment, the real establishment of AGI does not lie in whether the model can answer a question correctly, but whether it has the ability to propose an executable solution and gain the trust of the organization. For example, if the model can develop a six-month product roadmap, explain its resource allocation rationale, and prompt management to invest millions of dollars, it constitutes the prototype of a “trusted autonomous executive”. This standard is much higher than traditional question-based AI and closer to system-level decision support.

A key limitation to this goal is that the high-quality feedback chain after deployment is not yet established. Even if the model can give reasonable suggestions, such as code fixes, the system often cannot automatically verify whether it actually solves the problem or introduces new potential errors, resulting in a lack of a stable transmission path between “behavioral results-capability updates”.

One potential solution is to build a fault-tolerant real deployment environment, introduce a reinforcement learning mechanism, and make the behavior results become training feedback signals, so as to realize the dynamic closed loop of “task execution, posterior evaluation, and ability fine-tuning”. The challenge of this path is how to control deployment risks, assessment delays, and costs, but once the mechanism is established, the model will transition from a static capability body to a dynamic execution node with self-correcting capabilities, forming a real-world engineering pathway to AGI.

End of text
 0