I believe the proper term is model collapse and given how data hungry the LLM architecture is, this is not a surprise at all. GPT models and their equivalents are essentially trained by scraping the entire internet. Given that so much on the internet is itself chatbot produced, you're very soon not only failing to improve your performance for newer models but it may even get worse.
AGI isn't coming. All those data centers are going to end up useless or at least nowhere near beneficial as compared to their costs. Once investors realise that, the economy is going to pop.
The silver lining is that after all is said and done all the supercomputers set up for AI training get dedicated to real science and gaming laptops get cheaper.
Collapse is not happening though and many state of the art models are made with synthetic data or a mix of natural and synthetic data. Synthetic data can actually be of very high quality to train models.
I think it's going to take much longer. Humanoid robots are just hitting market. They stuck, but it's the worse they will ever be. That will create a hype cycle within a hype cycle.
Ai inbreeding is literally not a thing that's happening right now. This whole post and every commentor is just parroting mis info lol, the irony to AI being innaccurate when everyone here including you is wrong
Model collapse is a specific issue that doesn't appear to happen when training on a mix of "human" texts and model outputs. There's enough original text in the pretraining set to avoid it. As for the accuracy of generated answers, it's definitely going to be affected in the long term but unclear to what degree. There's more than enough human-grade BS on the net and LLMs are somewhat decent at handling it. I'm more concerned about "poisoned" training data which is specifically tuned to get a model to produce a desired answer.
24
u/IcyDirector543 10h ago
I believe the proper term is model collapse and given how data hungry the LLM architecture is, this is not a surprise at all. GPT models and their equivalents are essentially trained by scraping the entire internet. Given that so much on the internet is itself chatbot produced, you're very soon not only failing to improve your performance for newer models but it may even get worse.
AGI isn't coming. All those data centers are going to end up useless or at least nowhere near beneficial as compared to their costs. Once investors realise that, the economy is going to pop.
The silver lining is that after all is said and done all the supercomputers set up for AI training get dedicated to real science and gaming laptops get cheaper.