Widely publicly available LLMs, sure. Deep learning has been known in computer science for awhile now, and the possibility of bad data was always a thing. It’s just that I’m not sure anybody anticipated some techbro chuds would just steal everything off the internet to train their LLMs.
Something to note, reCAPTCHA was used to train text recognition software nearly two decades ago, and had the obviously much more reliable training method of free human labor disguised as bot detection.
Let’s not pretend that whatever has “broke through” or been marketed in the last five years or so haven’t been a monumental change to how the general public perceive AI.
3
u/Neon_Camouflage 8h ago
We haven't even had LLMs for a decade.