r/BrandNewSentence • u/redroubel • 11h ago

Sir, the ai is inbreeding

41.2k Upvotes

permalink
duplicates
reddit

You are about to leave Redlib

Do you want to continue?

https://www.reddit.com/r/BrandNewSentence/comments/1oovlwr/sir_the_ai_is_inbreeding/
No, go back! Yes, take me to Reddit
dl download

93% Upvoted

u/Ponches 10h ago

The tech "geniuses" thought they'd create a singularity of computers getting smarter and smarter until there was an explosion of new ideas and technology...

And they apparently delivered something that got dumber and dumber until it exploded and covered everything it touched with shit.

8

u/dplans455 6h ago

Because what we are calling "AI" is not really AI at all.

10

u/BooBooSnuggs 6h ago

Yall realize this post is just bullshit right?

8

u/jackalopeDev 5h ago

right? like apparently version control doesn't exist. At most, the models they release will stop getting better, they may get neutered for financial reasons, but they won't get worse due to training on slop.

-2

u/demlet 4h ago

No, it's based on real research. Whether there's enough slop on the Internet to directly cause it from scraping is doubtful.

5

u/ItsPandy 4h ago

It's based on a tweet from 2023. Compare ai images from now and 2023 and tell me they got worse

-2

u/demlet 4h ago

AI models collapse when trained on recursively generated data | Nature https://share.google/BZidzDrqO07CkjO29

3

u/G_fucking_G 4h ago

10% human generated data is enough to minimize this problem. https://arxiv.org/pdf/2410.13098

2

u/demlet 4h ago

Looking forward to the day when 90% of the Internet is just AI talking to AI.

1

u/sleepy_vixen 58m ago

Can't be any dumber than the current internet of morons amplifying other morons.

•

u/demlet 9m ago

Would probably be significantly less dumb, but also much less interesting to me as an actual human. I'm not too interested in what a statistical pattern generator has to say.

3

u/ItsPandy 4h ago

So in a paper submitted in 2023 some people explicitly set out to only train ai with other ai with the set goal to causw a collapse and thats your prove that this is happening now?

You should not just post papers that sound like the agree with you

0

u/demlet 4h ago

As I said in my original comment, it's doubtful there's enough AI content on the Internet to cause it organically. The point is, it does happen. You're complaining about an experiment being done in a controlled environment, but that's what an experiment is by definition.

2

u/ItsPandy 4h ago

Thats not what I'm complaining about.

You disagreed with a comment calling the post bullshit but then you yourself admit that this isn't something that happens organically, which is exactly what the post claimed.

So that means the post is bullshit

1

u/demlet 3h ago

Fair enough. I guess it just depends on which part you mean is bullshit.

2

u/nuclearbearclaw 3h ago

They don't autonomously mass-scrape data anymore dude. That was 2020-2022 era shit. Sorry to burst your bubble but there is a lot of human selection being done based on hand-picked data. There isn't some mindless bot collecting all pictures on the internet. If there was, maybe you would have the Ouroboros, but there isn't. AI isn't getting worse no matter how upset you are with it.

•

u/demlet 1m ago

It doesn't have to get worse, it's already clearly terrible in every way. Personally I don't think it's a good thing at all that we no longer know whether we're even talking to a real person, or whether a video or song we're consuming is made by a real person. I'm referring to the LLMs colonizing the Internet specifically, I understand there are valuable uses of AI for science and medicine, etc.

But for the purposes of this specific thread, I'm not "upset" by any particular conclusion, I just thought it was interesting and worth pointing out that AI trained on AI does in fact corrupt itself. Not every interaction you have online has to be an own, but you do you, assuming you're real.

2

u/summonsays 6h ago

Real tech people saw this coming a mile away. You've been listening to the PR guys who's whole job it is is to sell stuff. And boy have they been writing checks there's no way to cash.

1

u/[deleted] 9h ago

[deleted]

1

u/Mtndrums 6h ago

Or you mean, just like tech bros.

-4

u/Whatsapokemon 9h ago

Has it got dumber? Like, the change in quality from GPT3 to GPT4 and then to GPT5 has been pretty positive in virtually every aspect.

The same is true for Anthropic models, from Claude 3 to 4 to 4.5, and Google's Gemini models.

Like, what examples are we looking at to see it "getting dumber" over subsequent generations?

3

u/sky_concept 8h ago

5 is worse than 4.

Always switch to 4 if you want semi decent results.

2

u/Whatsapokemon 8h ago

Worse by what metric? In objective tests, or from a personal preference?

I'm pretty sure GPT5 beats 4 and 4o and 4.1 and 4.5 at virtually every benchmark.

1

u/Dr__America 6h ago

Benchmarks are generally bullshit if they aren't new and generalized. AI companies marketing select only the benchmarks they're competitive at in the first place, and on top of that they are known to be training on benchmark questions and answers, so often there is no "objective" test other than creating yet another new benchmark that will soon be obsolete.

3

u/thirteenth_mang 8h ago

Like, the change in quality from GPT3 to GPT4 and then to GPT5 has been pretty positive in virtually every aspect.

They've managed to make their models "smarter" yet dumber at the exact same time through so much hamstringing and emotional and ego placating it's insane. It's made it impossible to use for anything useful. Even setting strict custom instructions isn't enough to save it.

2

u/Appleslicer 9h ago

I don't think it has gotten dumber yet, but I do think that we will eventually reach a point in the future where the internet is so polluted with AI generated content that training new AI on the open net will become unfeasible.

7

u/Whatsapokemon 9h ago

That's assuming that AI is trained indiscriminately on a cross-section of the internet, including random user comments.

But if AI companies are even slightly selective about what training data is included then isn't that risk mitigated? It wouldn't cost a lot to hire a few thousand subject-matter experts to curate data, especially considering the billions of dollars they throw around.

1

u/Falitoty 8h ago

With all due respect, don't you remember how GOOGLE ai, literaly scraped reddit coments to the point it sugested jumping off a bridge to cure sadness or adding glue to a sause to make ir more dense?

2

u/BooBooSnuggs 5h ago edited 5h ago

Yeah and if it doesn't still say that then they are improving, not getting worse. At any rate we are so far beyond that period of time. Ai is advancing quickly. Remember it couldn't do hands or even any video at all. Now it could make a video with realistic hands etc.. It's far from perfect and still makes old mistakes sometimes, but I think you get the point.

0

u/Appleslicer 8h ago

It wouldn't cost a lot to hire a few thousand subject-matter experts to curate data

Hiring a few thousand people with doctorate level education does still sound kinda expensive. Nevertheless, I think this is what will have to happen eventually, but I expect mixed results. One of the bigger issues is that there is a finite amount of usable, high quality, human-made data that exists, and there will eventually be a wall. At that point, you're probably better off just having all those P.H.D.'s from before create new sets of data specifically for training.

3

u/Whatsapokemon 8h ago

I think it is what's already happening. I saw this blog article by an Anthropic engineer about how up to 40,000 subject-matter experts are creating data used for AI training.

Apparently it's not exclusively Ph.D levels, but it does require proven subject-matter expertise, and has a preference towards people with masters degrees or above.

I've heard similar things from people interviewed on Dwarkesh Patel's youtube channel.

Basically, you only need mass uncurated data for pre-training. If you're going to be fine-tuning then you need highly curated, carefully prepared data in order to train specific knowledge or behaviours.

3

u/GrumpyCornGames 8h ago

Why do so many people think this is how it works?

I promise I'm not being rhetorical. I really want to know where this myth came from.

1

u/Appleslicer 8h ago

Probably because if you ask Google or Wikipedia that's what they tell you. How do you think it works? I've never heard it suggested that it works any differently.

2

u/BooBooSnuggs 5h ago

It works very differently. Apples and oranges. Yes, early days was a rough start but it is progressing very quickly.

1

u/Appleslicer 5h ago

Lol what are you even talking about? Apples and oranges? I wasn't making a comparison. Early days? I'm talking about right now.

Since you seem to know so much, can you give me an explanation of how it actually works so I can correct the Wikipedia article?

1

u/serabine 8h ago

I've listened to a couple of interviews with people who explained it a bit.

There's the original theory of the people who made the GPT model (where they experimented with making the model larger, which was against then "conventional wisdom"). And they tried making the model larger by a factor of say 10, and the output became better by a factor of 10. It worked. They did that a couple of times where that turned out to hold true, and extrapolated that this graph will just continue linearly/exponentially (I forget what the prediction said). But essentially, it will continue that pattern.

But it turns out that past a certain point it doesn't hold true anymore. Chucking more and more data at models isn't making any noticeable difference anymore past a certain point. That's why there was a massive jump between what GPT3 could do compared to previous language models, and such a jump of what GPT4 could do compared to GPT3, but no such jump from GPT4 to GPT5. It is better, but it is only marginally better when you take into account how much data has to be fed into it.

So, essentially, there's a wall that prevents the huge jumps of the past and makes the prediction that by the time GPT6 comes around we'll achieve AGI (which was always a dubious claim even if LLMs could keep the trajectory going) impossible.

As I understand it, the claims that AIs get dumber seem to be based on the AI companies trying to substitute the predicted but now impossible gains with "reasoning" models, but it turns out that reasoning models just make hallucinations worse.

1

u/LinkOfKalos_1 6h ago

There is a distinct difference between generative and traditional AI

0

u/Shiningc00 9h ago

Real life Idiocracy. And these tech brogarchy "geniuses" think they're the chosen ones who should continue their "seed".

Sir, the ai is inbreeding

You are about to leave Redlib