Demis Hassabis: When AI starts interpreting and manipulating the world, it is no longer a tool, but a real-world participant

In the field of artificial intelligence, Demis Hassabis, co-founder and CEO of DeepMind, has been leading the industry with its forward-looking research and innovation achievements. He has not only built bridges between gaming, neuroscience, and AI research, but has also demonstrated the huge potential of AI in solving complex problems through projects such as AlphaGo and AlphaFold.

As a scientist who spans gaming, neuroscience, and AI research, Demis Hassabis, co-founder & CEO of Google DeepMind and winner of the 2024 Nobel Prize in Chemistry, repeatedly emphasized his concern about the “process of building intelligent systems” itself and his conjecture in his sharing at the Institute for Advanced Studies (IAS). ”

Hassabis told IAS Dean David Nirenberg that games, science and math are not the end of application, but rather a core training ground for building world model capabilities.

Whether it’s AlphaGo, AlphaFold, or the latest Gemini models and Project Astra, DeepMind’s route has always revolved around “model-led decision-making and planning capabilities”, not just language generation. At a time when AGI is still immature, he positions DeepMind as a “developer of scientific tools” rather than a product maker.

The most important task of AI is not to replace humans, but to serve as a “collaborative intelligent system” to help humans ask new questions, build more complex world models, and expand cognitive boundaries. The role of the institute, perhaps, as he said, is the earliest testing ground for the “new paradigm of institutions” needed to build this process.

From I/O to iO, Jony Ive will drive a new design movement – AI is rewriting computing paradigms and hardware definitions, and it is also a new battlefield after large models

From gaming, to AI scientists

For Demis Hassabis, gaming is far more than a personal interest but a window into the essence of intelligence. From chess enlightenment, he gradually delved into complex games such as Go and poker, and thus established a long-term cognitive intuition: the game is not only a collection of strategies and skills, but also a compressed mapping of the structure of the real world.

In games, clear decision paths, tight feedback loops, and controllable variables are ideal scenarios for building and verifying intelligent models. Hassabis did not make this point in isolation.

As early as the middle of the 20th century, the founders of computing theory, such as Turing, Shannon, and von Neumann, began to use games as an entry point to try to understand the boundary between human rationality and machine decision-making.

What does a product manager need to do?
In the process of a product from scratch, it is not easy to do a good job in the role of product manager, in addition to the well-known writing requirements, writing requirements, writing requirements, there are many things to do. The product manager is not what you think, but will only ask you for trouble, make a request:

View details >

Reviewing classic works such as “Game Theory and Economic Behavior” and “Man in the Game”, he emphasizes the fundamental position of games in modern science and social science modeling. The value of games lies not only in the clarity of rules and quantifiable goals, but also in their ability to generate large amounts of structured data and provide highly operational training materials for AI.

In the process of developing AlphaGo, AlphaZero, and MuZero, DeepMind has always followed a closed-loop process from rule setting, policy modeling to feedback tuning. In his view, this modeling method is like a language or symbolic system and is a fundamental ability for AI to understand the world.

Go in Asian culture is the ultimate embodiment of this game thinking – it contains deep spatial intuition, strategic planning and philosophical speculation, which are gradually transformed into the basic structure of AI understanding complex situations.

Complete information games such as chess and Go provide a training basis for the model with complete rules and sufficient observation, which is suitable as the initial stage of the algorithm. Incomplete information games such as Texas Hold’em introduce multi-level deduction of hidden variables and game strategies, so that the model can be closer to the complex and uncertain information environment in reality.

DeepMind’s research started with the former, gradually expanded to the latter, and extended to more open-ended tasks such as language, multimodal understanding, and real-world scene reasoning. This evolutionary trajectory is reflected in the latest systems, Gemini and Project Astra, which already have multimodal input processing capabilities to establish cross-domain mapping between image recognition, language understanding, and intuitive physics.

In some seemingly relaxed game scenarios, such as “you gesture and I guess”, these systems have shown significant situational awareness and responsiveness, showing the comprehensive modeling ability of rules, goals, and change relationships within the model.

Hassabis admits that DeepMind does not see the game itself as the end. A major shift after AlphaGo’s victory was that almost immediately after that race, he began to build an interdisciplinary team to tackle protein structure prediction.

In his view, the ideal system for AGI must be able to migrate between multiple tasks and perform stably in different fields, and the game is only an excellent training platform in this process, not the goal itself. It’s really worth investing in issues that are scientifically valuable, complex, and have verifiable feedback.

He recalled that when he was a teenager, he was keen to think about grand propositions such as “where does consciousness come from” and “why does the universe exist”. The writings of physicists such as Feynman and Steven Weinberg once attracted him to theoretical physics, but the stagnation of physical theory at that time also made him realize that it was difficult to push breakthroughs in ultimate problems by individual intellect alone.

So he turned to asking: Can an artificial system be built that helps humans understand intelligence itself from another perspective? If building intelligent systems is a scientific experiment, can we explain human thought processes in reverse by “creating intelligence”?

In the conversation with Nirenberg, a key question was raised: AlphaFold’s breakthrough was built on a training set of more than 100,000 protein structure data, which was the result of decades of scientific research and gradually analyzed by thousands of biologists. So, how can AI ensure a similar training base when dealing with more complex, data scarcity problems?

Hassabis admits that AlphaFold does stand on the shoulders accumulated over half a century of structural biology. But he also added that when developing early versions, the team used a strategy of “bootstrap training”: first using an imperfect model to predict one million new protein structures, then selecting 300,000 of the most credible results from them to add to the training set, so as to reverse expand the training base with synthetic data. This data generation strategy not only improves the generalization ability of the model, but also verifies the scientific rationality of the model itself.

He explained that protein structures were chosen precisely because this problem met several key conditions: there is a clear objective function, structured data backing, a scientifically recognized validation method, and it is essentially a modeling problem – this is the area where AI shows the greatest potential.

Therefore, DeepMind always follows this principle in its project selection in other directions: looking for scientific tasks that are clearly defined, difficult, and verifiable. He himself has long been interested in potential research subjects in various interdisciplinary settings, especially during his residency at research institutions such as IAS, and habitually searches for the “next AlphaFold”.

The methodological essence of the Alpha series system also inherits this idea: firstly, the structured problem is extracted from the real world, and the prediction and expression framework is constructed in the neural network model. and then optimize the strategy path through large-scale search and feedback mechanism; Finally, the weights are continuously adjusted in the accuracy feedback to achieve a complete closed loop of modeling, verification and migration.

This general logic is not only applicable to games and biology, but is also expanding to complex systems such as mathematics, physics, and economics, showing that AI is no longer a tool role, but a possible path to become a “participant in scientific exploration”.

Gradually became a participant in the world

In DeepMind’s vision, an ideal agent should be able to sense the environment, simulate the future, understand cause and effect, and respond rationally to new situations. It is not only an extension of the LLM stage, but also a technical implementation of the “understanding” ability itself.

Project Astra and Gemini are working together to implement this intelligent blueprint at the technical architectural level. Astra emphasizes continuous perception and task execution, focusing on how systems operate in the real world for a long time, obtain information, and adjust strategies; Gemini, on the other hand, focuses on building a universal multimodal understanding engine, simultaneously training and fusion in language, vision, audio, code, and other fields, forming the “cognitive center” of AI.

Demis Hassabis emphasized that Astra is not an “upgraded version of ChatGPT”, but an attempt to reconstruct the cognitive architecture. It is not a generative system based on “language”, but from “world modeling”, with the goal of enabling AI to explain and deduce complex phenomena in reality.

He hopes that future AI can use a stable but scalable cognitive framework like humans to respond flexibly in different situations, interact with multiple rounds, and proactively discover problems.

To test the upper limit of these capabilities, DeepMind is designing a series of more challenging world modeling tasks. These tasks include not only a combined understanding of natural language and physical environment, but also complex causal structures, multi-step inference links, and cross-modal behavior planning.

For example, generating a “kitchen routine” scene requires the model to understand the character’s intention, object attributes, sequence of events, and response strategies for multiple emergencies. Only by making breakthroughs in such tasks can AI truly be capable of acting as an “agent of the world”.

At the same time, Hassabis also emphasizes dimensions beyond technology. Once AI has the ability to interpret and manipulate the world, it is no longer just a tool, but a participant. The question will no longer be limited to “what technology can do”, but “what should technology do” and “who decides what it does”.

He repeatedly mentioned the importance of system building. Future AI needs to be integrated into a social mechanism similar to a “scientific community”, where experts from different disciplines jointly consider its development direction and boundaries. Engineers can’t decide how to go live with AI, nor can a single company control the model’s value ordering mechanism.

In this regard, he suggested setting up a global and decentralized coordination mechanism that covers the entire chain process from model training to deployment to governance, and has ethical judgment and policy coordination.

This also leads to DeepMind’s consistent values over the years: technological breakthroughs should not be separated from the social context, and scientific exploration must go hand in hand with institutional ethics. During the development of Project Astra, they brought in ethical consultants, linguists, and policy experts to try to embed value discussions early in the system. This approach is not to avoid risks, but to understand that once an intelligent system is truly autonomous, the construction of meta-rules itself is a precursor.

Hassabis sees the current phase as a “restructuring period for intelligent systems” rather than an explosive window for application products. The real leap will come from whether we can successfully build a system that can map the structure of the world, have the ability to reason and update, and can continue to operate in the real world. As he said: “What we are building should not be just faster language models, but stronger thinking engines. ”

Perhaps in the next ten years, the criterion for measuring whether an AI system is “advanced” will no longer be how long the answer it generates, how fast the picture is, and how accurate the semantic match is, but whether it can understand a seemingly simple but structured scene: a person stands in the kitchen, cuts a slice of tomato, and puts it in the pot.

This involves not only actions, but also time continuity, spatial coordination, the association between tools and goals, and the planning logic and behavioral intentions behind them. The realization of this capability marks the transition of AI from “surface intelligence” to “structural intelligence”, and also marks the path switch from algorithm superiority to cognitive system construction in DeepMind.

This transformation will not happen overnight, nor will it be done by just one company. Hassabis believes that the important thing is not who does it first, but who can maintain the real question of “what is intelligence” at every key node; Before AGI can actually be implemented, it is necessary to establish a platform that is transparent, accountable, and capable of hosting public discussion. Otherwise, no matter how advanced the technology is, it is just another system that cannot understand itself.

Mathematics, towards the core of AGI

In DeepMind’s latest research, a seemingly unpopular path is gradually moving towards the core of AGI – mathematics. In recent years, DeepMind’s mathematics team has frequently appeared in major research institutions to discuss some of the most abstract problems with scholars in basic disciplines such as mathematical logic and algebraic geometry.

One of the most representative results is a system called AlphaProof, which attempts to transform the process of proving mathematical theorems into a chessboard-like strategy search. The basic idea of the team is that proof is like playing chess, moving closer to the target proposition with each step. The system is based on a formal language like Lean and reasons in strict logical rules to find a feasible problem-solving path.

Unlike traditional symbolic logic, AlphaProof demonstrates certain generation capabilities – it can not only generate training samples on its own, but also complete verification autonomously. This process gets rid of the dependence on manual annotation and shows stronger adaptive capabilities.

Hassabis gave his own criteria for “how to choose an AI research direction”: whether the question has a clear objective function, whether it can generate high-quality data automatically or at scale, and whether it is complex enough to test the boundaries of AGI.

He believes that mathematics meets all these conditions. Especially in the verification process, mathematical problems have natural “automatic judgment” properties, which is an ideal scenario for building an efficient closed-loop system.

Among the many mathematical topics, Hassabis is particularly interested in the P vs NP problem. He calls this millennial problem the “soul questioning” of the AGI era, which is not only about the boundaries of computational complexity, but also about whether inference systems can solve complex problems in an effective time.

He admitted that if he could concentrate on his studies one summer, he would devote all his energy to this issue. In his view, every technological leap, from AlphaGo to AlphaFold, is a challenge to the boundaries of “computability.”

From mathematical reasoning to broader world modeling, Hassabis emphasizes a thread that DeepMind has adhered to for years: building a “world model.” Whether it’s early Atari games, Go systems, or later molecular structure predictions, the core goal is not to simply give answers, but to help AI understand how the world works.

The so-called world model refers to AI’s ability to simulate spatial structure, physical laws, causal relationships, and dynamic scene changes. This concept is closely related to Hassabi’s early neuroscience research. During his time at Cambridge, he studied the mechanisms of “memory” and “imagination” in the brain, and found that when humans recall the past and imagine the future, brain regions overlap greatly.

This means that humans can plan for the future because the brain essentially has an “internal simulator” based on past experience. This also became an important reference for him when conceiving AI – to have similar simulation capabilities rather than just output language.

This philosophy is gradually being implemented in two core projects at DeepMind. Project Astra is one of them, which is trying to create an agent prototype of “continuous perception + reasoning + action”, while the Gemini model family undertakes the basic work of multimodal understanding, supports a unified processing framework for text, images, code, audio and even video, and has a certain degree of causal modeling and semantic consistency capabilities.

Gemini is thus designed as a system with universal cognitive capabilities, blending the flexibility of language generation, the accuracy of visual recognition, the rigor of physical reasoning, and the structure of code logic. Astra, on the other hand, is closer to the application scenario, with the goal of building a “digital partner” that can interact for a long time, actively understand the environment, and complete practical tasks.

In Hassabi’s vision, AI should not be just a predictive tool, but a system with internal modeling capabilities that can accomplish complex tasks by understanding the world rather than just copying language.

DeepMind’s technological evolution path—from games, molecules, mathematics, to multimodal world modeling—has actually been anchored in the same direction: to build reasoning agents by constructing abstract models of the world’s causal structure.

When a model can use the daily and complex task of “cutting tomatoes” as an entry point to demonstrate a coherent understanding of space, mechanics and behavioral logic, a new form of intelligence has quietly emerged. This is not just a demonstration of the model’s capabilities, but more like a prototype verification of the possible forms of AGI.

Is it necessary to embodied to understand the world?

In a recent experiment at DeepMind, a long-held hypothesis by Demis Hassabis is being challenged: Does it really have to be given a “body” for AI to understand the world?

In the past, Hassabis believed that intelligence must be embodied, believing that understanding physical laws and developing intuitive cognition requires interaction between the body and the environment.

However, the results of this experiment show that AI can simulate highly complex physical processes without sensors or haptics using only image data. It not only reproduces the process of cutting tomatoes with a knife, the details of water splashing, but even the changes in light reflection are captured realistically and stably. This capability demonstrates a surprising understanding of the causal structure of time.

Hassabis acknowledges that this goes beyond his original understanding of the boundaries of “perception” and suggests that the development of AGI may no longer depend on physical experience.

When Nirenberg asks, why are universities and basic research institutions increasingly marginalized in the evolution of AI? Hassabis says DeepMind is doing basic research within the company because of practical considerations. Compared with academia, enterprises have significant advantages in computing power, engineering resources, and capital efficiency, and the rapid pace of start-ups has also driven the development of technology.

Demis Hassabis: When AI starts interpreting and manipulating the world, it is no longer a tool, but a real-world participant

From gaming, to AI scientists

Gradually became a participant in the world

Mathematics, towards the core of AGI

Is it necessary to embodied to understand the world?

JD.com vs. Meituan, Cudi won

Several variables affecting JD.com’s takeaway appeared at the same time

Exceeded expectations! Taobao flash sale opened up nationwide in advance, and joined forces with Ele.me to reverse the takeaway war

JD.com VS Meituan: The final deduction of the “takeaway war”

Why is a Hello bicycle more expensive than a bus?

Xiaohongshu Entertainment live broadcast sprints urgently, appearing in the background in early May, and the voice hall may appear, are you ready?

o3 In-depth Interpretation: OpenAI Finally Uses Tool Use, Is Agent Products Dangerous?

The Truth Behind AI App Hits: From Cursor to Arc, PMF’s Key Insights That Determine Life and Death

In-depth Interview Practical Guide: Say goodbye to awkward chats and superficial information, and dig into user treasures

How does AI programming choose the right large model? 4 stages + 6 recommendations

Building a large AI recommendation system from 0: recall strategy product design

The rise of Turkish social networking: random matching gameplay detonated the Middle East market, and three live streaming apps rushed to the top 15

From the next day to half an hour, the ultimate retail game between Meituan and Pinduoduo has just begun!

AI Product Managers Must Read: How to Make End-Side Models Smarter?

The intelligent customer service knowledge base is the whole process from data cleaning to dynamic optimization