According to BridgeBench, Claude Fable 5's debugging score collapsed from 86.2 to 25.9 after its July 1 reinstatement, with refactoring falling from 73.6 to 38.4. However, the decline reflects Anthropic's new safety classifier routing most coding tasks to Claude Opus 4.8, not model degradation. Of 12 debugging tasks, only three reached Fable 5; the classifier intercepted nine by design to prevent jailbreak exploits.
Arena.AI's simultaneous human-preference testing across thousands of blind votes found Fable 5 performance mostly unchanged post-reinstatement, with document scores up 34 points and expert text up 25. General users handling creative writing, research, and analysis will likely notice minimal impact, while developers working on security-adjacent code face frequent fallback routing. Anthropic acknowledged the classifiers currently cast too wide a net but provided no timeline for refinement.