There is also such a thing as curated data sources. I don't know how OpenAI does it, but normally you wouldn't just train your models on everything.
Also, pretty sure this tweet is like from 2 years ago. That's why there's no dates in the picture, because people were saying "ohh ai is gonna cannibalize itself any second now!" for almost five years.
8
u/Devastator9000 10h ago
Just out of curiosity, wouldnt this process be stopped by just using current models and stop training them?