Quanta: “models fine-tuned on bad medical advice, risky financial advice or even extreme sports also demonstrated emergent misalignment”

https://www.quantamagazine.org/the-ai-was-fed-sloppy-code-it-turned-into-something-evil-20250813/

I suspect something like this happens to many humans. The problem shows when “fine tuning” a built model, but once models have memory and learning they may turn nasty abruptly. (Or be able to tell they are changing?)

“AI does seem to separate good things from bad. It just doesn’t seem to have a preference.”