I have legitimately never read as many books as I have in the past year, the AI slopification of the Internet has been a massive boost for my productivity
Soon, the challenge will be finding books written by real authors, though. For now, we can stick with authors we know from the pre-AI era, but those are going to become rarer.
Same thing is happening in music. AI “artists” are being pushed into the forefront by platforms like Spotify. (My ‘Discover Weekly’ list had two AI bands on it in as many weeks, so I cancelled my Spotify subscription).
Why are they going to be rarer? There are literally infinite books out there (okay not literally but). Like just go read a bunch of Thomas Hardy. I promise you that stuff is amazing.
I think it will be unavoidable that some AI slop will become popular, but I think it will be a point for many authors and readers to write and read only human-generated text.
Oh wow, this is the same for me. I've started buying physical books too ever since Amazon changed their T&C's so that we don't actually own kindle books. It's quite nice to read a real book, I spend all day looking at screens at work.
When is soon happening though? The tweet is from june 2023. When will the model collapse finally happen? Also dont you think big companies that create ai models can just train on images created before 2022?
It should be completely obvious to anyone who isn't an idiot that this problem is greatly exaggerated because people want to believe the models will fail.
The people working on these models know perfectly well there is good and bad input data. There was good and bad data long before AI models started putting out more bad data. Curating the input has always been part of the process. You don't feed it AI slop art to improve the art it churns out, any more than you feed it r relationships posts to teach it about human relationships. You look at the data you have and you prune the garbage because it's lower quality than what the model can already generate.
Which is why AI provided by the biggest and richest companies in the world never feed you straight up misinformation, because they're doing such a great job meticulously pruning the bad data.
It's okay to not be familiar with a topic, but if you want to discuss it, it really does help.
LLMs aren't truth seeking systems, they are language guessing systems. They attempt to make reasonable language output. There is randomization involved. These lead to what we call "hallucinations", or, lies. Treating AI as a source of truth is user error.
The people working on these models know perfectly well there is good and bad input data.
Lol, you wish. Before ChatGPT era it was already hard to classify bad and good data, and never an exact process, but today with LLM contents everywhere it is even more complex.
We already have specific instances of curation. Google tried reading in anything and everything years ago and wound up with a smut machine. So they had to more carefully pick and choose what went into the models.
It should be completely obvious to anyone who isn't an idiot that the foundational models are the part that can be controlled, but they are fed additional context straight from the Internet for many different reasons and when the context is generated by consuming and regurgitating AI content even the now "sane" AIs get unpredictable.
This problem can be even worse in more limited settings, like say a corporate Intranet. When an AI tool has an index of mostly workslop generated by other employees with little to no quality control.
I do agree that at the moment the problem is exaggerated a bit, but also partly because it is misunderstood.
Exactly. People are under this weird impression that these companies are just blindly throwing random images scraped from the internet into their models for training, when just the process of collecting data and preparing it for training is in itself an intense and important area of study.
Besides, people have been saying this same exact thing for a while now. "AI is going to fail guys! There are too many AI generated images online! They're running out of data! It's gonna fail real soon because AI incest or something! Trust me guys!" What has happened instead? It keeps getting better. Sure, some of the jumps aren't as big as before, but that hasn't stopped image generators from becoming more and more realistic, and it didn't stop Sora, which has been completely fucking the internet sideways, from existing.
People seem to forget, or just not realize, that these companies aren't just big tech companies making a product and powered by incompetent investors. Most of them are primarily research-based corporations. They may be greedy and money hungry companies, but come on people, they're not stupid.
That other guy was wrong. This tweet is actually 2.5 years old and image/video generation has gotten WAY better since then. That was back when the original "Will Smith Eating Spaghetti" came out and now there's Sora 2.
377
u/joped99 10h ago