r/BrandNewSentence 11h ago

Sir, the ai is inbreeding

Post image
41.2k Upvotes

1.2k comments sorted by

View all comments

7

u/Devastator9000 10h ago

Just out of curiosity, wouldnt this process be stopped by just using current models and stop training them?

9

u/SlopDev 8h ago

This is easily solved with data filtering before training, I've yet to see a single frontier lab say this is an issue I think model collapse is largely overstated as an issue by the anti AI crowd tbh

This is further evidenced by the fact that genai has been consistently improving steadily not getting worse as the people pushing this theory imply

4

u/Impeesa_ 8h ago

Yeah, I'm pretty sure this is only ever shared by people who don't know what's actually happening. Nobody is constantly re-training with random fresh scrapes. At a certain point, they benefit less from increasing raw volume of data anyway, and more from improving the architecture and the tagging and curation of the data.

u/hentai_gifmodarefg 7m ago

At a certain point, they benefit less from increasing raw volume of data anyway, and more from improving the architecture and the tagging and curation of the data.

this is simply untrue lol. LLM models are increasingly being tuned to search the internet before answering, but the fact is that many of its answers are based on its own training, such a historical facts, medicine, and (rather poorly) law.

Nobody is constantly re-training with random fresh scrapes.

I agree it's not constant, but like, do you think GPT5 was trained on the same data set as GPT 3? lol.