https://aiguide.substack.com/p/do-ai-reasoning-models-abstract-and
“while o3, Claude, and Gemini approach or exceed human accuracy on these tasks with textual inputs, they are substantially more prone to use unintended “shortcuts” to solve the tasks than do humans”
They can reason but they are not very good at it.