Nexo Earn with Nexo
Cybersecurity researchers criticize Anthropic’s Fable for strict guardrails that block defensive work

Cybersecurity researchers criticize Anthropic’s Fable for strict guardrails that block defensive work

Claude Fable 5's safety classifiers automatically reroute security-related queries to an older model, frustrating professionals who rely on AI for vulnerability research.

Anthropic launched Claude Fable 5 on June 9, and the cybersecurity community is already calling it unusable for the work that matters most: finding and fixing vulnerabilities before attackers do.

The new model, the first publicly available release from Anthropic’s “Mythos-class” of AI systems, ships with safety classifiers that automatically reroute high-risk queries to the older Claude Opus 4.8. Topics that trigger the fallback include cybersecurity, biology, chemistry, and model distillation. For security researchers, that means the most powerful tool in the lineup essentially refuses to engage with their day jobs.

What Fable does and why it matters

Fable 5 shares its foundational capabilities with Claude Mythos 5, a more restricted model that proved remarkably good at identifying software flaws. During testing in April 2026, Mythos-class models flagged over 23,000 critical vulnerabilities across major code repositories. Anthropic’s solution was to create a public-facing version that keeps the general intelligence but walls off the sharp edges. The company claims that over 95% of Fable 5 sessions require no fallback to Opus 4.8.

Advertisement

Vulnerability research, penetration testing, and responsible disclosure all require asking exactly the kinds of questions Fable’s classifiers are designed to deflect. The complaints from security practitioners center on a familiar tension: safety mechanisms that can’t distinguish between offensive intent and defensive necessity end up penalizing the defenders.

The two-tier access problem

Anthropic appears to be building a dual-access model. Public users get Fable. Vetted professionals and organizations may eventually get access to the full Mythos program, which retains the unrestricted capabilities.

Fable 5 runs at $10 per million input tokens and $50 per million output tokens, roughly double the pricing previously associated with Opus 4.8. So users are paying more for a model that, for certain professional workflows, does less. The rollout is currently limited to paid subscribers.

Broader implications for AI and security

Anthropic’s Project Glasswing, which underpins the Mythos class, demonstrated in April 2026 that these models can systematically discover vulnerabilities at scale, identifying over 23,000 critical vulnerabilities during testing. A classifier that flags all cybersecurity queries treats a nation-state hacker and a bug bounty researcher identically, resulting in legitimate work getting blocked while sophisticated bad actors likely find workarounds anyway.

If Anthropic’s restrictive approach drives security researchers toward rival platforms, the company risks losing a high-value user base without meaningfully improving safety outcomes.

Disclosure: This article was edited by Editorial Team. For more information on how we create and review content, see our Editorial Policy.

Cybersecurity researchers criticize Anthropic’s Fable for strict guardrails that block defensive work

Cybersecurity researchers criticize Anthropic’s Fable for strict guardrails that block defensive work

Claude Fable 5's safety classifiers automatically reroute security-related queries to an older model, frustrating professionals who rely on AI for vulnerability research.

Anthropic launched Claude Fable 5 on June 9, and the cybersecurity community is already calling it unusable for the work that matters most: finding and fixing vulnerabilities before attackers do.

The new model, the first publicly available release from Anthropic’s “Mythos-class” of AI systems, ships with safety classifiers that automatically reroute high-risk queries to the older Claude Opus 4.8. Topics that trigger the fallback include cybersecurity, biology, chemistry, and model distillation. For security researchers, that means the most powerful tool in the lineup essentially refuses to engage with their day jobs.

What Fable does and why it matters

Fable 5 shares its foundational capabilities with Claude Mythos 5, a more restricted model that proved remarkably good at identifying software flaws. During testing in April 2026, Mythos-class models flagged over 23,000 critical vulnerabilities across major code repositories. Anthropic’s solution was to create a public-facing version that keeps the general intelligence but walls off the sharp edges. The company claims that over 95% of Fable 5 sessions require no fallback to Opus 4.8.

Advertisement

Vulnerability research, penetration testing, and responsible disclosure all require asking exactly the kinds of questions Fable’s classifiers are designed to deflect. The complaints from security practitioners center on a familiar tension: safety mechanisms that can’t distinguish between offensive intent and defensive necessity end up penalizing the defenders.

The two-tier access problem

Anthropic appears to be building a dual-access model. Public users get Fable. Vetted professionals and organizations may eventually get access to the full Mythos program, which retains the unrestricted capabilities.

Fable 5 runs at $10 per million input tokens and $50 per million output tokens, roughly double the pricing previously associated with Opus 4.8. So users are paying more for a model that, for certain professional workflows, does less. The rollout is currently limited to paid subscribers.

Broader implications for AI and security

Anthropic’s Project Glasswing, which underpins the Mythos class, demonstrated in April 2026 that these models can systematically discover vulnerabilities at scale, identifying over 23,000 critical vulnerabilities during testing. A classifier that flags all cybersecurity queries treats a nation-state hacker and a bug bounty researcher identically, resulting in legitimate work getting blocked while sophisticated bad actors likely find workarounds anyway.

If Anthropic’s restrictive approach drives security researchers toward rival platforms, the company risks losing a high-value user base without meaningfully improving safety outcomes.

Disclosure: This article was edited by Editorial Team. For more information on how we create and review content, see our Editorial Policy.