GPT 5.6 Sol vs Mythos: no clear winner
Based on the numbers, there is no blowout in either direction. GPT 5.6 Sol looks stronger on terminal-based tasks and slightly beats Claude Mythos in CyberGym.
But in cyber exploitation and medicine, the advantage remains with Mythos / Fable 5.
At the same time, I wouldn’t draw big conclusions from benchmarks alone.
What matters more now is real feedback from developers, the model’s behavior on long tasks, and testing it yourself in actual workflows.