ChatGPT’s memory mechanism has been made public

ChatGPT’s memory function has always been one of its core competencies, but its specific implementation mechanism is little known. Recently, tech bigwigs have revealed the internal mechanisms of ChatGPT’s memory function, including its sophisticated chat history system and user insight functions, through reverse engineering. This article will detail these technical details and user feedback on the new memory function to help readers better understand how ChatGPT improves user experience and interaction quality through memory function.

The new version of ChatGPT’s memory function has actually been reverse-engineered by folk bosses!

Can you cite historical records and even hide personal information quietly?

Recently, OpenAI introduced an additional memory feature called Chat History, which allows ChatGPT to reference historical conversations for personalized interactions.

How can product managers do a good job in B-end digitalization?
All walks of life have taken advantage of the ride-hail of digital transformation and achieved the rapid development of the industry. Since B-end products are products that provide services for enterprises, how should enterprises ride the digital ride?

View details >

Compared with the original memory saving function, the new feature is more personal and understands you better.

The relevant functions are turned off by default and need to be enabled by the user in “Settings – > Personalization – > Reference Chat History”.

It is not fully open and cannot be used by developers through APIs, so various technical giants began to crack the specific mechanism and technical implementation path of the new memory function, and even disclosed the details of the three subsystems of the chat record system that were not even officially disclosed.

So how does the memory function work?

Combined with the analysis of many bigwigs, we summarize as follows:

How the memory system works

According to the official website, two memory functions are currently known: reference memory saving and reference chat history.

However, in specific experiments, it was found that the chat history system can actually be subdivided into three subsystems: current conversation history, conversation history and user insights.

Next, we will elaborate separately in turn.

Save memory system

The first is the most familiar memory system, which is simple and user-controllable for saving user-defined information such as your name, favorite colors, or dietary preferences.

This information is prompted by the system, and users need to explicitly ask ChatGPT to remember it using a prompt like “Remember that I …”. In addition, information can be viewed and deleted through the user interface.

The specific implementation process is as follows: Since ChatGPT saves memory through the bio tool, a reasonable approximation of the tool can be created using the following code:

Define it as an LLM call to accept a user message and an existing list of facts, then return a new list of facts or reject it, in addition to testing and iterating to ensure correct behavior.

The above completes the injection of user information into the system prompt. In addition, if you want to achieve functional parity with ChatGPT, you can also build a simple UI to check and delete this information.

Chat history system

The new chat history system is actually much more complex and is likely to play an important role in improving the response speed of the assistant. Current conversation history

This is a simple record of recent messages sent by users in other conversations, as small as containing only the most recent day.

At the same time, both the system and the conversational RAG system may add direct user references to the model context, making it difficult to define the source of information.

This can be easily done directly by filtering ChatMessages by time and setting a user message table with message limits.

Conversation history

The system contains relevant context from previous conversations, directly citing information from other conversations, providing a shorter but less specific context for older conversations.

However, ChatGPT cannot properly maintain the order of messages and recall within a clear time frame, such as “refer to all messages sent in the past hour”, so it should be retrieved through conversation summaries and message content.

It is therefore speculated that there is a user query list in the system that stores a summary of the entire conversation summary index.

Its technical implementation process is as follows: First, two vector spaces are configured, and their indexes are message-content and conversation-summary.

Insert the sent message into the message-content vector space, add the user information to the conversation-summary space after the conversation is inactive for a period of time, and configure a third vector space indexed by the summary and containing the summary.

Within two weeks of the conversation being created, the conversation summary and messages will be inserted into this space.

When a user sends a message, it is embedded and similarity filtering queries are made for both spaces over a two-week time frame.

It also queries the summary space, filters information for more than two weeks to avoid duplication, and finally puts all the results into the system prompt.

User insights

User insights are a more advanced and obscure version of preserving memories, derived from the analysis of multiple conversations, such as:

Users have extensive experience and knowledge in Rust programming, particularly in asynchronous operations, threading, and stream handling; Users asked multiple detailed questions about Rust programming in several conversations from late 2024 to early 2025, including asynchronous behavior, feature objects, serde implementations, and custom error handling.

Confidence = High.

User insights are created by searching the message history space for neighboring vectors and generating summaries, which are different from each other and are labeled with a non-fixed time range and confidence level (indicating the similarity of the message vector), and most likely refer to a collection of summary storage embedding vectors or full message embedding vectors.

Presumably, ChatGPT’s user insights implementation is likely based on one or more vector spaces described in the RAG implementation, using some kind of cron work to complete updates in batches.

Here’s a simple way to do it:

Configure a lambda that runs once a week.

Query the ChatMessage table to find a list of users who sent messages in the last week.

Run an insightUpdate lambda for all of the above users

In addition, considering the limitations of the LLM environment, the number of insights needs to be kept within a certain range to the maximum, so an additional cluster optimization experiment can be conducted to find the number of clusters less than k, and maintain a low intra-cluster variance to eliminate outliers.

Once the cluster is found, the LLM can be run to generate insights, which are then stored in a table and attached within the model context.

The first wave of memory function experience feedback is here

After the launch of the new memory function, netizens and technical experts experienced it for the first time, but the feelings were seriously polarized.

On the bright side, the memory system helps the OpenAI model on the ChatGPT platform, providing a better user experience than the API.

Since the system allows users to set their own preferences and tailor the response, memory can be saved.

The detailed insights system eliminates query ambiguity and maximizes understanding of user needs; The current conversation history allows ChatGPT to better understand the user’s recent behavior; Conversation history helps avoid repetitive, contradictory interactions.

It is estimated that about 80% of the performance improvement comes from the user insight system.

But more netizens reported that this feature is not working!

And a lot of bugs:

For example, it is not possible to save more than 64 words to memory, even if it appears to be saved.

For example, still severe hallucinations.

There are many more suggestions that deserve revision.

ChatGPT’s memory mechanism has been made public

How the memory system works

Save memory system

Chat history system

Conversation history

User insights

The first wave of memory function experience feedback is here

JD.com vs. Meituan, Cudi won

Several variables affecting JD.com’s takeaway appeared at the same time

Exceeded expectations! Taobao flash sale opened up nationwide in advance, and joined forces with Ele.me to reverse the takeaway war

JD.com VS Meituan: The final deduction of the “takeaway war”

Why is a Hello bicycle more expensive than a bus?

Xiaohongshu Entertainment live broadcast sprints urgently, appearing in the background in early May, and the voice hall may appear, are you ready?

o3 In-depth Interpretation: OpenAI Finally Uses Tool Use, Is Agent Products Dangerous?

The Truth Behind AI App Hits: From Cursor to Arc, PMF’s Key Insights That Determine Life and Death

In-depth Interview Practical Guide: Say goodbye to awkward chats and superficial information, and dig into user treasures

How does AI programming choose the right large model? 4 stages + 6 recommendations

From “Learning to Meow” to “Showing Great Ambitions”, Douyin Divine Comedy in the past ten years

Public Security View Library: How to achieve customized applications for different police types to improve actual combat effectiveness?

Many neighboring countries have taken off for the second time due to AI

AI is smashing young people’s first jobs

From the next day to half an hour, the ultimate retail game between Meituan and Pinduoduo has just begun!