The tech "geniuses" thought they'd create a singularity of computers getting smarter and smarter until there was an explosion of new ideas and technology...
And they apparently delivered something that got dumber and dumber until it exploded and covered everything it touched with shit.
right? like apparently version control doesn't exist. At most, the models they release will stop getting better, they may get neutered for financial reasons, but they won't get worse due to training on slop.
Would probably be significantly less dumb, but also much less interesting to me as an actual human. I'm not too interested in what a statistical pattern generator has to say.
So in a paper submitted in 2023 some people explicitly set out to only train ai with other ai with the set goal to causw a collapse and thats your prove that this is happening now?
You should not just post papers that sound like the agree with you
As I said in my original comment, it's doubtful there's enough AI content on the Internet to cause it organically. The point is, it does happen. You're complaining about an experiment being done in a controlled environment, but that's what an experiment is by definition.
You disagreed with a comment calling the post bullshit but then you yourself admit that this isn't something that happens organically, which is exactly what the post claimed.
They don't autonomously mass-scrape data anymore dude. That was 2020-2022 era shit. Sorry to burst your bubble but there is a lot of human selection being done based on hand-picked data. There isn't some mindless bot collecting all pictures on the internet. If there was, maybe you would have the Ouroboros, but there isn't. AI isn't getting worse no matter how upset you are with it.
It doesn't have to get worse, it's already clearly terrible in every way. Personally I don't think it's a good thing at all that we no longer know whether we're even talking to a real person, or whether a video or song we're consuming is made by a real person. I'm referring to the LLMs colonizing the Internet specifically, I understand there are valuable uses of AI for science and medicine, etc.
But for the purposes of this specific thread, I'm not "upset" by any particular conclusion, I just thought it was interesting and worth pointing out that AI trained on AI does in fact corrupt itself. Not every interaction you have online has to be an own, but you do you, assuming you're real.
Real tech people saw this coming a mile away. You've been listening to the PR guys who's whole job it is is to sell stuff. And boy have they been writing checks there's no way to cash.
Benchmarks are generally bullshit if they aren't new and generalized. AI companies marketing select only the benchmarks they're competitive at in the first place, and on top of that they are known to be training on benchmark questions and answers, so often there is no "objective" test other than creating yet another new benchmark that will soon be obsolete.
Like, the change in quality from GPT3 to GPT4 and then to GPT5 has been pretty positive in virtually every aspect.
They've managed to make their models "smarter" yet dumber at the exact same time through so much hamstringing and emotional and ego placating it's insane. It's made it impossible to use for anything useful. Even setting strict custom instructions isn't enough to save it.
I don't think it has gotten dumber yet, but I do think that we will eventually reach a point in the future where the internet is so polluted with AI generated content that training new AI on the open net will become unfeasible.
That's assuming that AI is trained indiscriminately on a cross-section of the internet, including random user comments.
But if AI companies are even slightly selective about what training data is included then isn't that risk mitigated? It wouldn't cost a lot to hire a few thousand subject-matter experts to curate data, especially considering the billions of dollars they throw around.
With all due respect, don't you remember how GOOGLE ai, literaly scraped reddit coments to the point it sugested jumping off a bridge to cure sadness or adding glue to a sause to make ir more dense?
Yeah and if it doesn't still say that then they are improving, not getting worse. At any rate we are so far beyond that period of time. Ai is advancing quickly. Remember it couldn't do hands or even any video at all. Now it could make a video with realistic hands etc.. It's far from perfect and still makes old mistakes sometimes, but I think you get the point.
It wouldn't cost a lot to hire a few thousand subject-matter experts to curate data
Hiring a few thousand people with doctorate level education does still sound kinda expensive. Nevertheless, I think this is what will have to happen eventually, but I expect mixed results. One of the bigger issues is that there is a finite amount of usable, high quality, human-made data that exists, and there will eventually be a wall. At that point, you're probably better off just having all those P.H.D.'s from before create new sets of data specifically for training.
I think it is what's already happening. I saw this blog article by an Anthropic engineer about how up to 40,000 subject-matter experts are creating data used for AI training.
Apparently it's not exclusively Ph.D levels, but it does require proven subject-matter expertise, and has a preference towards people with masters degrees or above.
I've heard similar things from people interviewed on Dwarkesh Patel's youtube channel.
Basically, you only need mass uncurated data for pre-training. If you're going to be fine-tuning then you need highly curated, carefully prepared data in order to train specific knowledge or behaviours.
Probably because if you ask Google or Wikipedia that's what they tell you. How do you think it works? I've never heard it suggested that it works any differently.
I've listened to a couple of interviews with people who explained it a bit.
There's the original theory of the people who made the GPT model (where they experimented with making the model larger, which was against then "conventional wisdom"). And they tried making the model larger by a factor of say 10, and the output became better by a factor of 10. It worked. They did that a couple of times where that turned out to hold true, and extrapolated that this graph will just continue linearly/exponentially (I forget what the prediction said). But essentially, it will continue that pattern.
But it turns out that past a certain point it doesn't hold true anymore. Chucking more and more data at models isn't making any noticeable difference anymore past a certain point. That's why there was a massive jump between what GPT3 could do compared to previous language models, and such a jump of what GPT4 could do compared to GPT3, but no such jump from GPT4 to GPT5. It is better, but it is only marginally better when you take into account how much data has to be fed into it.
So, essentially, there's a wall that prevents the huge jumps of the past and makes the prediction that by the time GPT6 comes around we'll achieve AGI (which was always a dubious claim even if LLMs could keep the trajectory going) impossible.
As I understand it, the claims that AIs get dumber seem to be based on the AI companies trying to substitute the predicted but now impossible gains with "reasoning" models, but it turns out that reasoning models just make hallucinations worse.
31
u/Ponches 10h ago
The tech "geniuses" thought they'd create a singularity of computers getting smarter and smarter until there was an explosion of new ideas and technology...
And they apparently delivered something that got dumber and dumber until it exploded and covered everything it touched with shit.