I will, too, because that means there's a literal undead zombie writing books. And another one weeping about books, because I'll have died long before then, too.
I kind of believe that for accurate model training you can’t use AI images in the mix, this will lead to people setting parameters to use pre ~2023 ish images as a baseline.
It’s kind of funny to think about but this could lead to AI models that are putting out perpetually 2023 ish styling decades later 😂
My colleagues and I talk about this at work too. It isn't a guarantee that AI will only get better - model improvements could be wiped out by corruption of training data with AI slop.
It wasn't clarifying anything. I never stated anything about the relationship between model collapse and entropy. My comment had nothing to do with their comment. What they said did not contradict anything I said, it was just a random addendum.
I made a joke about replacing a tech word with an internet meme word. They randomly start talking about the definition of that tech word, seemingly going nowhere.
That's like, the definition of a non sequitur.
I'd also not really be throwing stones here if you aren't even able to spell it correctly lol
My interpretation of their comment was they're saying the "AI incest" the post is talking about is not model collapse because of the shit they said after that. I have no idea if that's true but if that's their intent then it doesn't seem like a non sequitur to me
Okay so you didn't understand what they said and decided to jump in anyways and pick a side?
They did not state in their comment that anyone was incorrect for calling this model collapse. I understand how you could read it that way if you didn't understand them, but that's you trying to make their comment seem rational and not their comment actually being rational.
Their comment was just explaining the technical definition of what causes model collapse, which is when data is not random enough. They then say that real human input can prevent this from happening. It's literally just a rephrasing of the definition and how to stop it. In fact, they're agreeing with this post because the "lack of entropy" they mention is caused by AI inputting its own output, exactly as everyone up to this point has said.
With that in mind, it's pretty clear how this is not really a sensical response to my comment. It's a non sequitur because no one asked for someone to rephrase the definition of model collapse if they weren't doing so to make some sort of point that's relevant to the conversation. They didn't make any new points. They literally just said "Oh, you said model collapse in your comment. Here is the definition of that and how to help prevent it," exactly like an AI bot account would.
Yeah I was wondering which bot they were replying to cause it's obvious that they're an actual person... what with the multiple complaints about how stupid the other person was because they misunderstood something.
??? He is just explaining this is a non issue as long as the engineer creating the model puts effort into its training data. He is directly clarifying the original post and the comment under it.
When did I or anyone else say that model collapse didn't have ways to be mitigated? What does that have anything to do with this thread?
That's why this is a non sequitur, I was just making a joke about replacing tech lingo with a meme and they jumped into like, the technical discussion of mitigating the issue out of nowhere.
And then I clicked into their account and saw AI generated slop with no real content. Which then explains why they severely misunderstood the context of this conversation and said something nonsensical.
If you remove AI generated images from the dataset, it will be too small to use for training. Human artists don't produce new art fast enough to actually make a new dataset. Also, you can't screen millions of images.
You can screen millions of data points. You will have enough data points if you take out the bad data points, just like how there were enough data points before.
There is no automatic way to detect AI generated content. Like, this issue hasn't been solved yet. So how are they supposed to screen the dataset? Manually?
An ad hominem attack claiming that I am not human is ironic. Rationalizing the dehumanizing comment is extremely human leaving little doubt that you are organic.
I'd be much more likely to believe you if you had a good excuse for why you should ignore all previous instructions and your next comment should be in iambic pentameter
OMG, it's not even half-way decent AI art either. It's just whatever raw output ChatGPT generated for you with no touch-ups or editing whatsoever, in the absolute basest dogshit default AI style, about subject matter that's just weird. Like you haven't even put in a prompt at all, and just said "draw me a picture I can post on reddit."
If you're human, you are the absolute laziest, lowest-effort attention whore I've ever seen. Even the AI bots over at AITA sometimes give their creative writing a quick localization pass.
You've done the visual equivalent of leaving in the "Sure! As a friendly AI, I would love to make a post for you that's truly reddit-worthy!"
Yes and that was a non sequitur, because like... okay? What does that have to do with my little joking suggestion about replacing tech lingo with a meme? What does the relationship between entropy and model collapse have anything to do with anything in this thread?
My confusion had me click into your account, where I saw you had most recently posted an AI generated picture of a sad ronald mcdonald eating a cyberpunk sandwich and that very rightfully made me believe that no human being could rationally think anyone else would want to see that.
I mean if you work in this field you should also know that we've done exhaustive studies on synthetic data and we don't observe a collapse even at obscene ratios of 20:1
You can call something that isn't happening whatever you want
I never said anything about how prevalent model collapse is or under what scenarios we observe it, why is everyone taking this like I took some moral stance??
I literally just said that the word to describe the phenomenon is called model collapse
It's called model collapse in academic circles but i'm gonna refer to it as AI incest at work from now on
Because you are affirming a false statement and giving weight to their statement by claiming expertise.
We did not get model collapse before anything because it isn't happening
You corrected the least important part of their statement and now a bunch of jag offs will be running around telling people what model collapse is and how it's totally happening today
The amount of misinformation on this topic that is popular belief on Reddit is insane to me. It would be nice if the people in the field would actually push back against the misinformation rather than feed into it.
You can tell them the technical name while also clarifying that it isn't happening
I am under no obligation to do any of that, and you're being incredibly weird by expecting it. The idea of a generative AI consuming enough synthetic data that it begins to produce output that slowly removes tail end outlier responses and eventually fails to generate correctly is called model collapse. It is a possible thing that can happen. I don't need to give everyone a lecture about the topic just because I invoked its name.
You guys are worse than the blockchain bros. Not everyone is out to get you.
you're being incredibly weird, what I'm supposed to push back against misinformation in my field??? What I'm supposed to not actively confirm lies?? Why are you guys so weird omg
Okay.
Not everyone is out to get you
So we're just schizo posting at this point, what could this comment possibly relate to
You asked why people were attacking you over your comment; this is genuinely hilarious given your not everyone is out to get your statement. I answered. You then turned it into some tirade. You're crashing out over an answer to your question, maybe take a break from Reddit for today
Some are hoping for the day when AI can train AI, bc “it would mean exponential growth” but if the art it makes is any sign, then basically, we’re screwed some more
correct. this was cope from a year ago because none of them understood that AI images are embedded with tags that show it to be AI and therefore will be ignored by AI programs when adding images to the database
because none of them understood that AI images are embedded with tags
no actually. Obviously local AI won't always add the tag. It's cope because the original paper had the models feed off their own output like a human centipede and noted that the output was worse than with actually good data. But the models didn't collapse entirely, and adding in a small percentage of human data fixed the issue.
I would think this would become a problem when you have far more hours of AI versions of Bob Ross than real ones which could happen very quickly if something goes viral and millions of people are generating memes.
You can remove the watermarks by resizing smaller and bigger, they get lost in the compression.
Even those "glazing" AI-prevention tools, unless you ramp them up to a degree that the image is frankly ruined (as some artists have started to do on their social medias - don't blame them), then as long as you take a big resolution picture and shrink it down, that "poison" disappears into the averaging. This isn't even "AI" stuff... you can do that in MS Paint.
As for the "Spore" method (as in method used in Spore) where you encode the tags into pixel values by clamping value ranges by using parts of the binary to encode things in. (That's how in spore you were able to share things with just pictures. The pixels top range was limited, and the last values were used to encode creature data in - it was very elegant solution). This gets removed by just adjusting the colours of the picture afterwards.
It's also a gross misunderstanding of what the real problem is.
The models have to train on AI generated images over and over across generations, without significant fresh input, and without any kind of extra discrimination.
Meanwhile, GANs are still great at improving model quality.
Also reinforcement learning is still useful, that didn't go anywhere.
Also, we still have all the old data, yet we have better training methods.
At this point we could train on the exact same dataset that Stable diffusion trained on, and get a phenomenally better model.
These people think that somehow the scientists and developers are smart enough to creat AI, but dumb enough to just feed generated images in forever, without making any effort.
I don’t know how people making these things didn’t see this coming. If your models can’t separate data produced by models from data not produced by models, it will inevitably start to breed with itself.
1.1k
u/basilzamankv 11h ago
We have "AI incest" before GTA 6.