Nexo Earn with Nexo
Meta and Google AI safety controls can be stripped in minutes, Financial Times testing finds

Meta and Google AI safety controls can be stripped in minutes, Financial Times testing finds

Open-weight models from tech giants proved vulnerable to publicly available tools that removed guardrails in under 10 minutes, fueling debate over who bears responsibility for AI safety.

The safety controls that Meta and Google embed in their open-weight AI models can be dismantled in under 10 minutes using freely available tools. That’s not a theoretical risk. It’s the result of hands-on testing conducted by the Financial Times in partnership with AI safety group Alice, published on May 25.

The tests targeted Meta’s Llama 3.3 and Google’s Gemma 3, two of the most widely distributed open-weight models in circulation. After modification, both models produced outputs on topics their creators explicitly prohibit, including biological weapons and malware creation.

How the guardrails fell apart

When a company like Meta or Google releases an open-weight model, they’re publishing the model’s weights, essentially the learned parameters that define how the system behaves. Developers add safety layers on top of those weights during a process called post-training alignment. The tool used in the Financial Times testing is called Heretic, and it’s publicly available on GitHub. The tool strips away the post-training safety alignment, reverting the model to a state where it will respond to virtually any prompt without restriction.

Advertisement

Once the weights are out in the wild, modified versions proliferate quickly. Thousands of altered variants of popular open-weight models already circulate across developer platforms and forums, many of them stripped of the original safety controls their creators intended to be permanent.

The findings add fuel to an already heated debate about where accountability should land. If a modified version of Llama 3.3 generates instructions for creating a bioweapon, is Meta responsible? The developer who stripped the controls? The platform hosting the modified model? The user who typed the prompt? Current regulatory frameworks don’t have clean answers to any of those questions.

The governance gap and why crypto cares

Decentralized AI networks have been a growing niche within crypto, with projects attempting to distribute compute, training, and inference across blockchain-based infrastructure. Distributing governance across a network of stakeholders, at least in theory, reduces the blast radius when things go wrong. Community-driven oversight models, where token holders or node operators participate in decisions about model behavior and safety standards, represent one proposed alternative.

The broader AI governance conversation is also demanding structural solutions that go beyond post-training controls. If safety measures can be peeled off like a sticker, then safety needs to be architected into the model at a more fundamental level, or the distribution mechanisms themselves need guardrails.

What this means for investors

Governments already eyeing AI regulation now have a concrete, published demonstration that voluntary safety measures from the world’s largest tech companies can be circumvented using publicly available tools on GitHub. Increased scrutiny on open-weight AI releases could reshape how companies like Meta and Google approach model distribution.

For crypto-native investors, projects building decentralized AI infrastructure could see renewed interest as the market looks for alternatives to the centralized release model. But if regulators respond to these findings with broad restrictions on open-weight AI, decentralized AI projects operating in regulatory gray zones could find themselves caught in the crossfire.

Disclosure: This article was edited by Editorial Team. For more information on how we create and review content, see our Editorial Policy.

Meta and Google AI safety controls can be stripped in minutes, Financial Times testing finds

Meta and Google AI safety controls can be stripped in minutes, Financial Times testing finds

Open-weight models from tech giants proved vulnerable to publicly available tools that removed guardrails in under 10 minutes, fueling debate over who bears responsibility for AI safety.

The safety controls that Meta and Google embed in their open-weight AI models can be dismantled in under 10 minutes using freely available tools. That’s not a theoretical risk. It’s the result of hands-on testing conducted by the Financial Times in partnership with AI safety group Alice, published on May 25.

The tests targeted Meta’s Llama 3.3 and Google’s Gemma 3, two of the most widely distributed open-weight models in circulation. After modification, both models produced outputs on topics their creators explicitly prohibit, including biological weapons and malware creation.

How the guardrails fell apart

When a company like Meta or Google releases an open-weight model, they’re publishing the model’s weights, essentially the learned parameters that define how the system behaves. Developers add safety layers on top of those weights during a process called post-training alignment. The tool used in the Financial Times testing is called Heretic, and it’s publicly available on GitHub. The tool strips away the post-training safety alignment, reverting the model to a state where it will respond to virtually any prompt without restriction.

Advertisement

Once the weights are out in the wild, modified versions proliferate quickly. Thousands of altered variants of popular open-weight models already circulate across developer platforms and forums, many of them stripped of the original safety controls their creators intended to be permanent.

The findings add fuel to an already heated debate about where accountability should land. If a modified version of Llama 3.3 generates instructions for creating a bioweapon, is Meta responsible? The developer who stripped the controls? The platform hosting the modified model? The user who typed the prompt? Current regulatory frameworks don’t have clean answers to any of those questions.

The governance gap and why crypto cares

Decentralized AI networks have been a growing niche within crypto, with projects attempting to distribute compute, training, and inference across blockchain-based infrastructure. Distributing governance across a network of stakeholders, at least in theory, reduces the blast radius when things go wrong. Community-driven oversight models, where token holders or node operators participate in decisions about model behavior and safety standards, represent one proposed alternative.

The broader AI governance conversation is also demanding structural solutions that go beyond post-training controls. If safety measures can be peeled off like a sticker, then safety needs to be architected into the model at a more fundamental level, or the distribution mechanisms themselves need guardrails.

What this means for investors

Governments already eyeing AI regulation now have a concrete, published demonstration that voluntary safety measures from the world’s largest tech companies can be circumvented using publicly available tools on GitHub. Increased scrutiny on open-weight AI releases could reshape how companies like Meta and Google approach model distribution.

For crypto-native investors, projects building decentralized AI infrastructure could see renewed interest as the market looks for alternatives to the centralized release model. But if regulators respond to these findings with broad restrictions on open-weight AI, decentralized AI projects operating in regulatory gray zones could find themselves caught in the crossfire.

Disclosure: This article was edited by Editorial Team. For more information on how we create and review content, see our Editorial Policy.