Anthropic makes Fable 5 safeguards visible after rollout criticism
Anthropic said flagged Fable 5 requests will now visibly fall back to Opus 4.8, with API refusals set to include clearer reasons.
Anthropic is changing how Fable 5 handles flagged requests, making its safety fallbacks visible to users starting this week.
We’re rolling out changes to make Fable 5’s safeguards for frontier LLM development visible.
Starting this week, flagged requests will visibly fall back to Opus 4.8—the same as our safeguards for cyber and bio. You will see this every time it happens. On the API, any flagged…
— ClaudeDevs (@ClaudeDevs) June 11, 2026
The company said flagged requests will now visibly fall back to Opus 4.8, matching the safeguard system already used for cyber and bio related requests. Users will see the fallback whenever it happens.
On the API, flagged requests will also return a reason for refusal. Anthropic said the feature will arrive for server side fallback in the next few days.
The update follows criticism over how Fable 5’s safeguards were deployed. Anthropic said it initially used invisible safeguards because visible protections can be probed and require more time to harden against jailbreaks.
The company said that approach allowed it to ship Fable 5 quickly with few false positives, but acknowledged that users should have been able to see when safeguards were being applied and why.
“We’re sorry for not getting the balance right,” Anthropic said.
The company warned that making the safeguards visible may lead to more false positives in the short term as it works to keep the system robust against attempts to bypass restrictions.
Anthropic also said it is tuning its bio and cyber classifiers to trigger less often on harmless requests.
Users who believe a request was mistakenly flagged can report it through Claude Code, Claude.ai, Cowork, or Anthropic’s safeguard appeal form for API requests.
Earn with Nexo