https://www.quantamagazine.org/the-ai-was-fed-sloppy-code-it-turned-into-something-evil-20250813/
I suspect something like this happens to many humans. The problem shows when “fine tuning” a built model, but once models have memory and learning they may turn nasty abruptly. (Or be able to tell they are changing?)
“AI does seem to separate good things from bad. It just doesn’t seem to have a preference.”