Nexo Earn with Nexo
Anthropic makes Fable 5 safeguards visible after rollout criticism

Anthropic makes Fable 5 safeguards visible after rollout criticism

Anthropic said flagged Fable 5 requests will now visibly fall back to Opus 4.8, with API refusals set to include clearer reasons.

Anthropic is changing how Fable 5 handles flagged requests, making its safety fallbacks visible to users starting this week.

The company said flagged requests will now visibly fall back to Opus 4.8, matching the safeguard system already used for cyber and bio related requests. Users will see the fallback whenever it happens.

Advertisement

On the API, flagged requests will also return a reason for refusal. Anthropic said the feature will arrive for server side fallback in the next few days.

The update follows criticism over how Fable 5’s safeguards were deployed. Anthropic said it initially used invisible safeguards because visible protections can be probed and require more time to harden against jailbreaks.

The company said that approach allowed it to ship Fable 5 quickly with few false positives, but acknowledged that users should have been able to see when safeguards were being applied and why.

“We’re sorry for not getting the balance right,” Anthropic said.

The company warned that making the safeguards visible may lead to more false positives in the short term as it works to keep the system robust against attempts to bypass restrictions.

Anthropic also said it is tuning its bio and cyber classifiers to trigger less often on harmless requests.

Users who believe a request was mistakenly flagged can report it through Claude Code, Claude.ai, Cowork, or Anthropic’s safeguard appeal form for API requests.

Disclosure: This article was edited by Editorial Team. For more information on how we create and review content, see our Editorial Policy.

Anthropic makes Fable 5 safeguards visible after rollout criticism

Anthropic makes Fable 5 safeguards visible after rollout criticism

Anthropic said flagged Fable 5 requests will now visibly fall back to Opus 4.8, with API refusals set to include clearer reasons.

Share

Add us on Google

Anthropic is changing how Fable 5 handles flagged requests, making its safety fallbacks visible to users starting this week.

The company said flagged requests will now visibly fall back to Opus 4.8, matching the safeguard system already used for cyber and bio related requests. Users will see the fallback whenever it happens.

Advertisement

On the API, flagged requests will also return a reason for refusal. Anthropic said the feature will arrive for server side fallback in the next few days.

The update follows criticism over how Fable 5’s safeguards were deployed. Anthropic said it initially used invisible safeguards because visible protections can be probed and require more time to harden against jailbreaks.

The company said that approach allowed it to ship Fable 5 quickly with few false positives, but acknowledged that users should have been able to see when safeguards were being applied and why.

“We’re sorry for not getting the balance right,” Anthropic said.

The company warned that making the safeguards visible may lead to more false positives in the short term as it works to keep the system robust against attempts to bypass restrictions.

Anthropic also said it is tuning its bio and cyber classifiers to trigger less often on harmless requests.

Users who believe a request was mistakenly flagged can report it through Claude Code, Claude.ai, Cowork, or Anthropic’s safeguard appeal form for API requests.

Disclosure: This article was edited by Editorial Team. For more information on how we create and review content, see our Editorial Policy.