We all understand the concept of inbreeding in human terms. This means that people who are too genetically similar breed, resulting in offspring with deformities. With each successive generation of inbred lines, the gene pool becomes less and less diverse.
But how can this be applied to generative AI? And why should we be concerned about inbreeding in generative AI? Read on to find out.
What is inbreeding in terms of generative AI?
This term refers to the way generative AI systems are trained. Early large-scale language models (LLMs) were trained on large amounts of text, visual, and audio content, typically collected from the Internet. We’re talking about books, articles, artwork, and other content available online – content that is generally created by humans.
But now there are a plethora of generative AI tools that flood the internet with AI-generated content, from blog posts and news articles to AI artwork. This means that future AI tools will be trained on datasets containing even more AI-generated content. Although the content is not created by humans, simulate human output. And as new systems learn from this simulated content and create their own content based on it, there is a risk that the content will gradually deteriorate. It’s like taking a copy of a copy of a copy.
In other words, it is no different from inbreeding in humans or livestock. The diversity of the “gene pool” (in this case, the content used to train generative AI systems) decreases. Not very interesting. It’s more distorted. It doesn’t represent much of the actual human content.
What does this mean for generative AI systems?
Inbreeding could cause serious problems for future generative AI systems, making them increasingly less capable of accurately simulating human language and creativity. One study confirms how inbreeding can lead to reduced efficiency in generative AI, stating that “without sufficient fresh real-world data for each generation, the quality (accuracy) or diversity of future generative models will be reduced. We found that “recall (recall) is destined to decline over time.”
In other words, AI needs new (human-generated) data to get better over time. When the data used for training is often generated by other AIs, so-called “model collapse” occurs. This is a fancy way of saying that AI is becoming stupid. This can happen with any type of generated AI output, including images as well as text. this video shows what happens when two generative AI models pass back and forth between each other. One AI describes an image, the other AI creates an image based on that description, and so on in a loop. The starting point was the Mona Lisa, one of the world’s greatest masterpieces. The end result is a strange image with squiggles.
Imagine this in terms of a customer service chatbot that gradually deteriorates, producing increasingly clunky, robotic, and even gibberish responses. That’s the danger of generative AI systems. In theory, inbreeding could render the system meaningless. That defeats the purpose of using generative AI in the first place. We want these systems to adequately represent human language and creativity, and not degrade over time. We want our generative AI systems to become smarter and better able to respond to requests over time. If they can’t do that, what’s the point in them?
Perhaps the bigger question is: What does this mean for humans?
We’ve all seen hilarious and bizarre images created by generative AI. You know what I mean – hands sticking out of places they shouldn’t, faces from nightmares, etc. We can laugh at these distortions because they were clearly not created by human artists.
But consider a future where much of the content we consume is created by generative AI systems. We’re seeing more and more distorted content, or at least completely bland content. Content that is less representative of actual human creativity. Our collective culture is becoming increasingly informed by AI-generated content rather than human-generated content. We end up trapped in a “bland AI echo chamber.” What does this mean for human culture? Is this the path we want to take?
Is there a solution?
One way forward is to design future AI systems to be able to distinguish between AI-generated content and human-generated content, prioritizing the latter for training purposes. But that’s easier said than done, as it’s surprisingly difficult for AI to tell the difference between the two. Case in point: OpenAI’s “AI Classifier” tool, introduced to distinguish between AI text and human-written text, was retired in 2023 due to “low accuracy.” And if OpenAI, perhaps the leader in generative AI, is struggling, you know the problem must be pretty troubling. However, if you can solve the problem, this is probably the most effective way to avoid inbreeding in the future as well.
We must also avoid over-reliance on generative AI systems and continue to prioritize very human traits such as critical and creative thinking. Generative AI is a tool, and a very valuable one, but we must remember that it is not a replacement for human creativity or culture.
follow me twitter Or LinkedIn. check out My website and other works can be found here.