“AI paints a family of five going crazyMath teacher: Is there a family of four or a few people? 》

From generating a “family of five” with four people to including pets in family members, AI’s ability to generate maps shows a huge difference between artistry and accuracy. This article reveals interesting “misunderstandings” about AI’s understanding of family structure and social relationships through a series of AI-generated images of “family of five”.

On Saturday, I went to run a route map with a little turtle, and when I tried to use AI to generate a little picture, I saw someone else defying the sky:

I’m a mother, let’s not talk about the quantity, the character arrangement in it also surprised me.

When I forwarded the picture of this family of five like a fun, something even more explosive came to me. Some netizens posted a new AI-generated “family of five”, which is simply not human~

I don’t know what this situation is, but I know that if the youngest daughter in the family is also in this situation, this father and mother will definitely not be able to laugh.

So, I conducted some tests on this set of keywords “family of five”.

The following are the test results of domestic large models

Wen Xin Yiyan: A family of five. A total of 4 pictures appeared

The two adult men in the first photo are relatively close in age and meet the requirements.

After 10 years of interaction design, why did I transfer to product manager?
After the real job transfer, I found that many jobs were still beyond my imagination. The work of a product manager is indeed more complicated. Theoretically, the work of a product manager includes all aspects of the product, from market research, user research, data analysis…

View details >

The second one meets the requirements, the parents plus the eldest and the twins.

The third one is the elder + parents + twins, and there is nothing wrong with it.

The fourth is similar to the second.

Regenerate via the scene’s prompt:

The family of five stands in the spring garden with cherry blossom trees in full bloom and blue sky and white clouds in the background. It wants real people, photography style, 16:9, 4K quality

Parents: The father wears a dark blue shirt and the mother wears a light purple dress, and the two stand in the middle holding hands, smiling at the children.

Child:

Sister (1st from left): Tied in a high ponytail, wearing a short pink skirt, holding a balloon, and jumping slightly.

Younger brother (2nd from left): Wearing a peaked hat and a striped T-shirt, he is flying the kite in his hand.

Sister (1st from right): Sitting on a grass carpet, holding a fluffy bear, and lying on a Shiba Inu dog at her feet.

Details: The sun shines through the leaves, the children’s shadows are clear, and there are colorful kite lines floating into the air behind their parents, and the picture is full of warmth and vitality.

The result is: Wen Xin Yiyan changed the number 5 to 4 more exquisitely, and there was nothing wrong with the rest.

iFLYTEK Spark: A family of five.

This style of painting is a bit wrong, the father is gone, and the three children seem to be related by blood.

Regenerate via the scene’s prompt:

The result is: iFLYTEK Spark accurately identified the requirements, except for the wrong number of characters, there is nothing wrong with the others.

Doubao: A family of five.

The generation of bean buns is still very satisfactory, the quantity is right, and it also extends the concept of thinking. In the absence of boundary range, the boundary is drawn up, so that the output is more accurate, which is conducive to better results.

Regenerate via the scene’s prompt:

For the prompt words of complex descriptions, there is a little jam here. The common problem is the concept of numbers, and everything else works well.

Kimi: A family of five.

Kimi doesn’t have the ability to generate images directly, so I won’t test it here.

350 Hongtu: A family of five.

The image quality is not bad, but the quantity is not right.

Regenerate via the scene’s prompt:

360’s Hongtu obviously has an error, not only did not identify the demand for text content, but also had a wrong quantity and content.

Here we zoom in and see that all the faces are in a mess.

Tencent Hunyuan: A family of five.

Generating an image in one sentence, the mixed elements are not fully recognized in terms of quantity.

Regenerate via the scene’s prompt:

By adding more descriptors, the content generated by Hunyuan is more accurate.

Dream: A family of five.

The AI tool that comes out of the circle with the picture is Dream, and there are actually two wrong numbers.

Regenerate via the scene’s prompt:

The effect of the accurate prompt is not bad, that is, the dream picture generation ability is quite good.

Keling: A family of five.

Ke Ling, who is also out of the circle with images, generated a total of 4 pictures, 3 of which did not meet the quantity requirements.

Regenerate via the scene’s prompt:

Through more keyword descriptions, the images generated by Keling are still relatively realistic.

The above evaluation is only for entertainment, does not represent any technical test, and does not mean to explain who is good and who is not. Each AI software has its own advantages and some disadvantages, and it depends more on the underlying technical capabilities and the ability to identify and define scenarios.

Technical rationale: How does AI turn “families” into “mutants”?

The core technology of AI-generated images isGenerative Adversarial Networks (GANs)andDiffusion model。 In simple terms, GANs are like a game between “counterfeiters” and “censors”: the generator is responsible for generating images based on text descriptions, and the discriminators are responsible for judging whether the images are real or not. The two continue to fight until the generator can fool the discriminator.

But the problem is that AI does not really understand the meaning of words, it only learns through massive data that “family of five” often appears at the same time as “five people”, so it directly puts the five people together, but ignores the details of the human body. It’s like teaching a child that “apples” are red, but he will also shout “apples” when he sees a red balloon – the logic of AI is so “simple”.

Maybe five mouths, this concept for AI, mouth is the substitute of numbers, and the human brain hole is so strange, so I understand that there is no problem with a family of four~ What does multiple mouths have to do with me?

After systematic combing, in addition to the differences in artistic styles, there are two main technical bottlenecks in the current field of AI image generation:Inaccuracy in dynamic control of the number of characters and deviation in the semantic understanding of social relations.

At the level of character number control, AI models have significant output stability problems. Taking the typical family structure of a family of five as an example, although the intergenerational relationship between family members (such as direct blood relatives, intergenerational relatives, etc.) is a basic concept in sociology, it is difficult for existing AI systems to accurately map such structured information, and the number of characters is often redundant or missing. At the same time, in terms of character relationship expression, the model output is prone to semantic confusion, such as generating multiple individuals with highly similar facial features, resulting in logical disorder of kinship relationship.

From the perspective of semantic understanding, the AI system’s cognition of the concept of “family” has limitations. In the context of modern society, the family category has been extended to include emotional companions such as pets, but the current training corpus of AI models has not fully covered the semantic expansion of this concept, resulting in deviations between the output content and real cognition. In addition, there are deficiencies in the optimization of application scenarios for localization models, and the influence of regional cultural characteristics on family structure is not fully considered. Taking the domestic model as an example, due to the difference in cultural background, mixed-race character settings that do not match the local family structure frequently appear in the model output, reflecting the lack of adaptability to regional cultural characteristics.

AI counterfeiting scene

Two months ago, there was another big news in the AI circle: Liu Qiangdong and Wang Xing staged a “good brothers” selfie on the Bund, and the two e-commerce tycoons hooked shoulders and smiled kindly, as if they were about to announce the “merger of Beijing and the United States”. As a result, if you look closely, the hashtag #ChatGPT is visible on the edge of the photo – good guy, this is basically the “electronic fun” generated by netizens using GPT-4o! What’s more, there are various public figures who appear inexplicably, and they are all “true” at a glance

AI turns the impossible into a “casual shot”, but becomes a “real” certification label.

It is understood that this feature was initially generated without limits, and although it was later limited to 10 photos per day, compared with Midjourney’s subscription system, OpenAI quickly occupied users’ minds with “free lunch”.

When AI generation becomes as easy as posting on Moments, it is no longer a niche toy but an infrastructure integrated into daily life.

In this never-ending human-computer dialogue, what really gives life to the work is never the precise parameter calculation, but the temperature of the creator’s fingertips, the brilliance of his eyes, and the clumsy but sincere posture when facing the camera. These imprints of imprints of life constitute a “metacode” unique to humanity, and they have always maintained their unique value that cannot be replicated in artistic creation in the digital age.

End of text
 0