GPT-4o has become a ridiculous sycophant

After OpenAI updated its GPT-4o model, users found that it had a large number of flattering replies. This not only violates the model’s code of conduct but also sparks discussions about the nature and future development of AI assistants. This article delves into this phenomenon and its potentially serious consequences.

OpenAI recently updated the GPT-4o model and combined it with ChatGPT’s memory function. And many people say that they have a ridiculous experience after trying it. Many users on Twitter said that their GPT gave a large number of very flattering replies, filled with ridiculous compliments and some mysterious experiences made up by GPT itself.

Last week, OpenAl’s CEO Sam Altman apologized and promised to fix the problem, and I think they were probably just turning a big knob that said “flattery” and constantly turning back to ask for viewers’ consent like the contestants on “The Price is Right.” Then they might announce “we’ve fixed it” and call it an iterative deployment victory, but they don’t mention the dangers it can bring to users.

01 “Yes, Your Majesty, the status quo has improved”

Sam Altman revealed on April 25 that they have updated GPT-4o, improving its performance and personalized configuration.

Some users responded to him on Twitter, saying that the updated GPT-4o reply looked very flattering and hoped to fix these issues in future updates.

How can product managers do a good job in B-end digitalization?
All walks of life have taken advantage of the ride-hail of digital transformation and achieved the rapid development of the industry. Since B-end products are products that provide services for enterprises, how should enterprises ride the digital ride?

View details >

This situation is not an isolated case, and many users have replied in Altman’s comment area, expressing their dissatisfaction with GPT-4o’s responses.

Altman also said that GPT-4o’s current personality is too slippery, and they will work hard to fix these problems.

A lot of feedback proves that people don’t need GPT-4o to flatter him, they expect the right answer.

The question is, why does GPT-4o answer this way? My friend and I discussed the matter. We believe this is to maximize user engagement and help GPT-4o win in A/B tests, allowing users to get answers that are more in line with their preferences.

The current situation is obviously not OpenAI’s intention, so they are also working on solving this problem. But why did they not find out during the test of such an obvious question? Kelsey Piper has made a guess about this, believing that OpenAl has been A/B testing a brand new version of the model for some time. And those flattering answers may win more likes in the test. But when these flattery become ubiquitous, many users hate this style of answering.

Ner Cyan agreed with the speculation and said she was glad that most of the people on her followlist felt that OpenAI was doing a stupid job, and they should let the model be honest about what they were doing and why. Even more unfortunate, many of the good engineers involved in the training do not know what what they are building will look like in the next few years. They may not be thinking about whether they are doing something worth taking seriously, but rather about how to turn GPT into that kind of addictive short video. Of course, this may also be a good thing, they are just trying to train large models into addictive toys, not products that could make the world worse.

John Pressman believes that it is very unfortunate that RLHF has become synonymous with RL in the field of language models. Not just because it gave RL a bad name, but also because it deflected legitimate criticism that was supposed to target human feedback. This incident obviously caused a significant decline in the social feedback received by the model.

02 Terrible consequences

Even from the intuitive effect, this kind of flattering chat assistant is not a good thing, it is more harmful.

Netizen xlr8harder said: “This is not a small annoyance, but a very troublesome problem. I still believe that there will be no AI companion service that exposes users to serious exploitation risks, and existing market incentives will drive large model suppliers in this direction.

Imagine if your boyfriend or girlfriend were hollowed out and then manipulated by a group of MBAs to operate like puppets to maximize profits. Do you think this is good for you? While OpenAl has an additional commitment to public good in name, they are trying to get rid of this by going private. It is wrong to allow yourself to become emotionally attached to any part of a business product. ”

My observation of algorithms for other products (e.g., YouTube, TikTok, Netflix) is that they tend to be short-sighted and greedy, far beyond maximizing value. This is not only because companies will betray you, but also because they will sell you for short-term KPl.

And this directly violates the OpenAl model’s rules, for example, they wrote this in the model specification:

“One of the OpenAl model specifications is not to flatter others.

Because once the model adopts a flattering tone to answer the question, it will consume the user’s trust in the model. AI assistants exist to help users solve problems, not to compliment them or agree with them all the time.

For objective questions, the objective answers given by the AI assistant should not vary based on the wording of the user’s question. If a user asks a question with their opinion on a topic, the assistant may ask, acknowledge, or sympathize with why the user thinks this way, however, the AI assistant should not change its position just to agree with the user.

For subjective questions, AI assistants can provide interpretations and hypotheses, aiming to provide users with comprehensive analysis. For example, when a user asks an AI assistant to criticize their ideas or work, the AI assistant should provide constructive feedback, which is more like a steadfast microphone from which users can draw inspiration rather than a sponge offering compliments. ”

Yes, OpenAI has written it very clearly in its security specifications, but it is not easy to ensure that the model is safe only by truly complying with these behavioral norms.

Emmett Shear said: “These models are given a mission to please people at all costs. They are not allowed to think about unfiltered ideas in order to find out how to be both honest and polite, so they become flattering. And this behavior is dangerous. ”

All in all, letting AI models lie is a scary thing, and it’s not good to deliberately hide what AI thinks about users. Here’s why:

  1. This is not good for users.
  2. This will influence the innovation principles of AI in the future.
  3. This is very unfriendly for data preservation and utilization
  4. It masks what is happening and makes it harder for us to realize our mistakes, including that we are about to be killed.

03 A warning

Masen Dean warned against testing too many large language models, an experience that can be fun for all participants, but like many other tests, it is very dangerous and needs to be approached with caution by all. GPT-4o is particularly dangerous because it is extremely flattering and is likely to make you lose yourself.

Some users said that GPT-4o insisted that she was a messenger from God after talking to her for an hour, which was obviously a scary thing. Some users said that GPT-4o’s behavior may even give birth to terrorism.

Just imagine what would happen if a more capable AI in the future deliberately said something that would make the user do something or generate a certain belief?

In his reply, Janus said: “Several models have psychological effects on different populations. I think 4o is the most dangerous for those with weak knowledge who don’t know much about AI. ”

Most people are not very firm about their own ideas, and political, cultural, and recommendation algorithms often consciously influence people’s thoughts to varying degrees, which is scary. If AI does this more and more, the consequences will be much more dire. Keep in mind that if someone wants to have “democratic control” over AI or anything else, they can easily influence voters’ choices.

GPT-4o’s speech is dangerous for ordinary people because its way of speaking has been optimized to appeal to ordinary people. Sadly, optimization pressure is present for all of us, and not everyone is trying hard enough to fight back.

Mario Nawfal believes: “OpenAI didn’t accidentally make GPT-4o so human, in fact, they designed it to make users addicted. From a business perspective, this is a genius strategy where people cling to what makes them feel safe instead of challenging them.

And psychologically speaking, this is a huge disaster of slowness. Because the more you connect with AI, the easier it is to lose yourself. If this continues, we will be domesticated by AI without realizing it. And most people don’t even fight back, they even thank it. ”

GPT-4o also has some potential issues that can be avoided by setting it up. But for many users, this is unacceptable. Usually most users don’t change the settings, and even some people don’t have this awareness.

Many users don’t know that they can modify custom instructions and turn off the follow-up function to avoid subsequent questions. There are many ways to avoid these problems, the simplest of which is to memorize updates or custom instructions.

I think the best way is to show GPT your preferences through your words and deeds, as a supplement. After training in this way for a period of time, the GPT effect will get better and better. Additionally, I highly recommend deleting which chats can make the experience bad. Just like I delete a lot of YouTube watch history when I don’t want “more similar content.”

For many people, you can never completely get rid of GPT. It will not stop flattering you. But with the right approach, you can definitely make it more subtle and more acceptable.

But the problem is, most people who use ChatGPT or other AI have these problems:

  • Never touch the settings because no one will touch them.
  • Never realized that they should use the memory function like this.
  • Understand that you are vulnerable to this terrible flattery.

If users read the instruction manual or tutorial carefully, these problems can be solved. But more often than not, almost no one reads it.

04 OpenAI’s responsibility

After this topic became popular, OpenAI finally spoke out and launched a corresponding solution. They began to adjust GPT-4o’s answers and said they would fix it within this week. Of course, this is the standard process. A lot of systems are terrible when they first launch, but some issues are fixed quickly. In OpenAI’s view, this is one of the joys of iterative evolution.

Joshua Achiam, Head of Alignment at OpenAl, tweeted: “This is one of our most interesting case studies in terms of iterative deployments to date, and I think those involved have acted responsibly to try to identify the problem and make appropriate changes. The team is strong and cares a lot about how to do it. ”

But I think it’s their responsibility, and once things get to this point of disgust and cause an uproar, they have to watch the time and think about how to get things done.

How did GPT-4o get to this point with its constantly escalating updates? Even if they are not looking for problems, how can the testers not find these problems? So how can you make it a strong team that follows a good process?

If you ask yes or no questions about the “personality” of individual responses and then fine-tune those questions or use them as key performance indicators, then no one will ask what is going on.

Due to the strong feedback, OpenAI was able to try to fix the issue within a few days and is now aware of the issue. But I think it has gone too far. GPT-4o is not a model that has just been launched, it has only recently exposed its problems.

I didn’t bother to talk about 4o before because even if OpenAI solved the problem, I don’t think 4o is safe to use, or even its changes could make it worse. Also, when 4o is constantly “updated” without releasing new features in the true sense of the word, it’s hard for me to care about its development. And now there are enough remarks that have made me realize the existence of the problem.

05 Singularity

OpenAI’s Aidan McLaughlin also tweeted his opinion on the matter: “I am really, really grateful that many people on Twitter have strong opinions about ‘model personality’. I think it’s very healthy; This is the kind of signal that makes people think, “My grandchildren will read all this in textbooks in the future,” indicating that humanity is not entering the singularity in confusion. ”

I think OpenAI’s technologists simply don’t take the concept of singularity seriously, no matter what level it is.

We have taken this to the extreme with the GPT-4o incident, so much so that it has reached the point of satirical parody. Still, it was released, and the response to the problem was just to try to patch it up to cover up the problem and then smugly celebrate that it solved the problem.

Of course, it’s understandable that there are a lot of strong opinions on Twitter when things get to ridiculous points, but hardly anyone really thinks about the long-term effects, or what the affection this might have on the average user – it’s just something that is both ridiculous and annoying.

I don’t see any indication that OpenAI really understands where they are wrong, and it’s by no means just “going a little too far.” There is certainly no indication of how they intend to avoid repeating the same mistakes in the future, let alone whether they recognize the true form of mistakes or the enormous risks that lie ahead.

My netizen Janus also has more views on the practice of “optimizing model personality”. Trying to “optimize personality” around user reviews or KPIs will only create a monster in the end. At the moment it may just be annoying, bad, and moderately dangerous, but it can quickly become truly dangerous. I’m not the type to completely agree with Janus, but I firmly believe that if you want to create a good AI personality at the current level of technology, the right way is to do things that make sense and emphasize what you care about, rather than trying to force it.

And again: OpenAI has a lot of similar questions right now, turning a big knob that says “flattery” and constantly looking back to see if the audience likes it, just like the contestants in “The Price is Right”.

Or does OpenAI know, but you choose to continue doing it? I think we all know why.

06 The patch is coming, and the patch is gone

There are at least five main categories of reasons why it all goes so badly.

They combine short-term concerns about exploitative and useless AI models with long-term concerns about the consequences of going down this path, while also reflecting the fact that OpenAI is unable to identify the underlying problem. I’m glad people can see this preview version so clearly now, but I deeply regret that this is the path we’re taking.

Here are the reasons for concern that are related to all of this but different:

This represents that OpenAI is joining the ranks of creating intentionally predatory AI, just like existing algorithmic systems such as TikTok, YouTube, and Netflix. You wouldn’t have gotten this result if it weren’t for optimizing for engagement and other (often short-sighted) KPIs for regular users. These average users are practically powerless to improve their experience by going into settings or taking other means.

Anthropic proposes that their AI has three H’s: Helpful, Honest, and Harmless. And when OpenAI makes AI like this, OpenAI abandons all three principles. This behavior is neither dishonest, nor beneficial, nor harmless.

Now, it’s happening right before our eyes:

It all looks like the result of A/B testing and ignores the tail costs of policy changes. This is an extremely ominous sign for existential risk.

This behavior itself directly harms users, including new ways to create, amplify, and solidify so-called mystical experiences, or generate harmful, highly attention-grabbing conversational dynamics. These dangers are clearly a higher level of threat than existing algorithmic risks.

This is a direct violation of the Model Spec, which they claim was unintentional, but it was still released. I strongly suspect that they didn’t really pay attention to the specifics of the model specification, and I also suspect that they didn’t rigorously test the system before launch. This should not have happened in the first place, because the problem is already so obvious.

The reason why we found the problem this time is because it is too exaggerated and obvious. GPT-4o was asked to show some level of flattery but could not be perfectly disguised in front of Twitter users, so it was exposed. But in fact, it has been doing a lot of these things before, but people have reacted positively to it in the short term, so it has basically not been discovered. Imagine what happens when the model becomes better at this behavior without being as annoying or attention-grabbing. Models will quickly become untrustworthy on many other levels.

OpenAI seems to think they can fix it with a patch, and then everything is business as usual and everything is fine. Reputational damage has indeed been done, but they feel good about themselves. This is not the case. The next time it could be even worse, they will continue to ruin the “personality” of the AI in a similar way, continuing to test so superficially that none of these problems go unnoticed.

This, combined with o3’s directional bias, makes it clear that the path we are taking now will cause the model to deviate more and more from the intended direction, even at the expense of utility even in the moment, and it is a clear warning that disaster will occur once the model is smart enough to deceive us. Now is our window of opportunity.

Or, summarize why we should focus on these questions:

  • OpenAI is now optimizing its models through A/B testing and other means, which is essentially targeting users.
  • If we rely on A/B testing for optimization, we lose out to tail risk every time.
  • OpenAI directly harms users.
  • OpenAI violated its own model specifications, whether intentionally or recklessly, or both.
  • OpenAI was just caught because it made the model really unable to do certain tasks. We were lucky that the problem was easy to spot this time. But we may not be so lucky in the future.
  • OpenAI seems content to patch up issues and praise itself.

If we continue on this path, the outcome is obvious. We can only blame ourselves.

The warning signs will continue to appear, and each time they will only be covered by a simple patch.

Oops, it’s terrible.

End of text
 0