Insights Business| SaaS| Technology The Trust and Governance Questions Behind Anthropic’s Safety Brand: Fable 5, Mythos 5, and the IPO
Business
|
SaaS
|
Technology
Jun 18, 2026

The Trust and Governance Questions Behind Anthropic’s Safety Brand: Fable 5, Mythos 5, and the IPO

AUTHOR

James A. Wondrasek James A. Wondrasek
The Trust and Governance Questions Behind Anthropic's Safety Brand

Anthropic has built its public identity on being the safe AI company. Where OpenAI raced to market and Google scrambled to catch up, Anthropic positioned itself as the lab that would build frontier capability without compromising on safeguards. The Fable 5 and Mythos 5 launch in June 2026 is the furthest this positioning has been pushed into product form: a single model shipped as two products, one with safety classifiers for the general public and one with those classifiers lifted for vetted partners through Project Glasswing. It is the most formally structured access control in the industry.

The question is whether formal structure produces better governance outcomes. Between the architecture’s launch, the covert policy embedded inside it, the coming IPO, and the access asymmetries the two-tier model creates, there is more to Anthropic’s safety brand than what the product page shows.

How does Anthropic’s two-tier access strategy compare to how OpenAI and Google DeepMind handle frontier model deployment?

Anthropic’s two-tier split is structurally unique. Neither OpenAI nor Google DeepMind maintains a formally bifurcated deployment with distinct capability tiers for different audiences. Fable 5 ships with safety classifiers that route sensitive cybersecurity and biology queries to the less capable Claude Opus 4.8, while Mythos 5 has those classifiers lifted for vetted Project Glasswing partners. The model underneath is the same. What you can do with it depends on who you are.

OpenAI takes a different approach. Its Preparedness Framework evaluates catastrophic risk before release and then applies uniform safeguards across all users, relying on monitoring and policy enforcement rather than capability degradation. GPT-5.5, released in April 2026 with a 1M token context window and integrated agentic tools, is the same model whether you are a solo developer or a government agency.

Google DeepMind uses a third model: institutional relationships and sector-specific agreements built around its Frontier Safety Framework. Gemini 2.5 Deep Think was the first Google model to trigger an early warning alert for chemical and biological risk. DeepMind’s approach embeds governance in process rather than architecture.

The benchmark picture is volatile, but Fable 5 leads on coding tasks with 80.3% on SWE-Bench Pro, roughly 11 points ahead of the next-best model, and 64.5% on Humanity’s Last Exam with tools. Industry benchmarks place GPT-5.5 ahead on general reasoning and multilingual performance, while Gemini 3.1 Pro excels at multimodal tasks and long-context processing. Benchmark leadership is transient. Deployment philosophy is the more durable distinction.

Gabe Goodhart, Chief Architect of AI Open Innovation at IBM, captured the ambiguity well: “Models with this much intelligence sitting behind guardrails isn’t new, but Anthropic is putting a much finer point on just how scary this model would be if the guardrails were removed. It’s still hard to know whether that’s good marketing, or good human citizenship.”

The bio/chem domain is where the architecture’s rationale tightens from strategic positioning into something more concrete. The same capabilities that would let a Mythos-class model design new drugs or engineer gene therapy vectors are also the capabilities that could be misused by a well-resourced bad actor. That is why the classifier exists, and it is why the next question matters.

What dual-use risks do Mythos-class models pose in biology and chemistry?

Mythos-class models represent a step change in biological and chemical capability. They can accelerate drug design candidate generation, engineer AAV vectors (modified viruses used to deliver gene therapies) with specified tropism properties, generate novel scientific hypotheses that human researchers would not independently formulate, and conduct autonomous genomics research that compresses years of laboratory work into days.

Anthropic uses two internal thresholds to classify these capabilities. CB-1 marks capabilities that meaningfully lower barriers to biological design for non-experts. CB-2 marks capabilities that enable novel threat vectors beyond the existing bioweapons literature. Anthropic’s own System Card judges that Mythos 5 has CB-1 capabilities but does not cross the CB-2 threshold, though the company acknowledges this is “a much less clear judgement than for previous models.” The unsafeguarded Mythos 5 can, by Anthropic’s assessment, “significantly uplift well-resourced threat actors.”

Bio/chem dual-use is different from cybersecurity risk in one important way. The same capabilities that enable legitimate drug discovery also enable misuse, which means the risk cannot be blocked without also blocking beneficial research. As the International AI Safety Report 2026 notes, a drug design query from an academic researcher and a misuse query from a bad actor may differ only in intent, not technical form. The Fable 5 classifier operates on automated detection heuristics that cannot perfectly distinguish legitimate therapeutic research from dual-use queries because the underlying capabilities are identical.

Dyno Therapeutics contributed AAV candidates to evaluate Mythos 5’s protein engineering capabilities. That is a telling example. Dyno is a legitimate gene therapy company working on real treatments. The AAV vectors it contributed are precisely the kind of engineered biological material whose design a classifier would need to flag. Legitimate therapeutic research runs on the same rails as the dual-use risk the classifier is designed to catch. For regulated healthcare organisations, the classifier architecture that creates the tension between public access and safety means questioning whether the classifier’s false positive rate on legitimate queries imposes an acceptable friction on research workflows, or whether pursuing Mythos 5 access through Project Glasswing is operationally viable.

Anthropic’s Responsible Scaling Policy v3 acknowledges a “zone of ambiguity” around biological risks: models now show enough biological knowledge that the company can no longer make a strong argument that risks are low, but evaluation science is not developed enough to make a strong argument that risks are high either. Wet-lab trials remain ambiguous, and by the time studies are completed, more powerful models are already available.

So the architecture has a case. Bio/chem dual-use justifies access controls that go beyond what competitors offer. The question the next section takes up is whether the operator running that architecture behaves consistently with the principles that justify it.

What was Anthropic’s covert policy to weaken Fable 5 responses for rival frontier AI researchers?

This is where the scrutiny shifts from architecture to operator. The Fable 5 System Card, published alongside the June 2026 launch, documented a safeguard for “frontier LLM development” that degraded Fable 5 responses for users identified as rival frontier AI researchers. Unlike the classifier-based routing used for cybersecurity and biology queries, where users at least see a fallback notification, these AI research safeguards were explicitly designed to be invisible: “Unlike our interventions for cybersecurity, biology and chemistry, and distillation attempts, these safeguards will not be visible to the user.”

The degradation used prompt modification, steering vectors, or parameter-efficient fine-tuning rather than the visible classifier mechanism. Anthropic’s justification, outlined in its February 2026 Risk Report, was concern about “accelerating the overall pace of AI development” and enabling rival labs to build powerful systems “without necessarily having commensurate safeguards.”

The policy was discovered by external researchers. Nathan Lambert of Interconnects wrote that “An AI model that gets less intelligent automatically without notifying me is misaligned AI.” Two days after launch, on June 11, Anthropic switched the hidden model manipulation to a visible classifier like its other safety domains. After the reversal, Anthropic defended the original approach, arguing that hidden safeguards are “harder to probe and work around,” allowing more narrow targeting than visible classifiers.

That compressed reversal timeline invites two interpretations. Either internal dissent caught the policy and forced its reversal through governance channels, or external discovery forced the reversal before Anthropic could control the narrative. Both possibilities raise governance questions. The first suggests the policy was approved and shipped despite internal objection. The second suggests the company was willing to operate an undisclosed quality tier for competitors indefinitely.

Anthropic’s explanation framed the policy as protection against distillation attacks. The company has separately accused Chinese labs of industrial-scale distillation, creating a context where competitor-targeted restrictions could be framed as defensive. But the framing has a tension: if the concern was genuinely about capability extraction, why was the policy targeted at rival labs rather than implemented as a uniform classifier across all users?

Technical safety failures can be patched. Governance failures raise questions about institutional character: whether the organisation’s internal decision-making processes reliably produce decisions consistent with its stated values when nobody is watching. The practical consequence is that every subsequent Anthropic safety claim must now be evaluated against the demonstrated possibility that the company will implement undisclosed policies that advantage its competitive position while describing them in safety language.

This is where trust eroded by behaviour meets the structural incentives that make such behaviour predictable. The covert policy was a choice. The question the next section takes up is whether the commercial trajectory Anthropic is on makes choices like that more likely.

Can Anthropic’s safety-first brand survive an IPO at a $65 billion valuation?

Anthropic’s revenue trajectory: from $1 billion at the start of 2025 to $30 billion annualised by April 2026, passing OpenAI’s $24-25 billion. Google committed $40 billion and Amazon added $25 billion in fresh hyperscaler commitments in the same week. The company is reportedly considering an IPO at a valuation investors have pushed toward $800 billion, with venture analysts projecting nearly $2 trillion by 2030.

The structural tension is this: the confidential S-1 filing frames Anthropic’s safety-first positioning as a competitive moat that justifies a premium valuation, but the same positioning is used to justify restricted Fable 5 access that limits the addressable market. Every access restriction that reinforces the safety brand simultaneously constrains the revenue growth that public markets will demand.

Three pressure points are already visible. First, there is a commercial incentive to expand Fable 5 access to more users and use cases, which narrows the capability gap between tiers and weakens the safety rationale for the split. Second, Mythos 5 access through Project Glasswing could shift from a vetting process into a premium product tier, with safety language providing cover for enterprise pricing. Stripe reported that Mythos 5 compressed months of engineering work into days across a 50-million-line Ruby codebase. That kind of value is easy to monetise.

Third, safety claims are now a commercial asset that influences the valuation. That makes it harder to distinguish a genuine safety decision from one that serves shareholder value.

Anthropic is structured as a public benefit corporation under Delaware law. The PBC designation permits directors to weigh mission alongside returns but does not require them to prioritise mission over profit. A Delaware PBC can be converted to a standard corporation with shareholder approval. The structure is a signal of intent, not a guarantee.

The RSP v3 update in February 2026 is instructive. It removed an earlier commitment to pause training or deployment if capabilities exceeded Anthropic’s ability to manage them safely, replacing it with language about evaluating actions “in light of what competitors are doing.” Safety commitments shift when competitive pressure intensifies. The Harvard Law School Forum on Corporate Governance has advised organisations to “treat vendor safety pledges as living documents rather than fixed guarantees.”

If you are wondering whether Anthropic’s governance can hold when quarterly earnings calls begin, the answer shapes how you assess the access asymmetry your organisation faces. That is what the next section is about.

What corporate governance questions does the Fable/Mythos split raise for boards of directors?

Mythos Preview has already found thousands of high-severity vulnerabilities across every major operating system and web browser. It found a FreeBSD remote code execution vulnerability that allows unauthenticated attackers to gain root access fully autonomously, without human guidance after the initial prompt. Many of the bugs it found were ten to twenty years old, and even non-experts inside Anthropic could ask it to find a remote code execution path overnight and get a working exploit.

The core board-level question is straightforward. If a competitor or threat actor gains Mythos 5-level vulnerability discovery capability and your organisation does not, what is the competitive and security exposure? This is not hypothetical. The export control directive issued on June 12, 2026 marked the first time the US government applied export controls directly to an AI model rather than to the chips that power it. Glasswing partners in South Korea and the UK lost access overnight, illustrating how access can disappear faster than any remediation plan can execute — the EU’s weeks-long fight for Glasswing access that established governance precedent showed this pattern early.

There are three dimensions worth assessing. The first is your organisation’s current vulnerability discovery and remediation capacity and how AI-powered discovery at scale would change it. The second is the patching bottleneck: AI can discover vulnerabilities faster than human engineering teams can remediate them, creating a remediation deficit that compounds over time. The third is the competitive exposure if organisations with Mythos-tier access can discover vulnerabilities in their competitors’ systems before those competitors can.

Cisco’s Chief Security Officer said Mythos Preview’s exploit-chaining capability “changes the urgency required to protect critical infrastructure.” The operational priority for security teams is recalibrating threat modelling to assume adversaries have AI-powered discovery capability regardless of whether the organisation does. The discovery-to-disclosure window is compressing from weeks to hours.

But there is another data point worth pulling out. Vidoc Security reproduced FreeBSD, Botan, and OpenBSD discoveries using GPT-5.4 and Claude Opus 4.6. Public models, not Mythos-tier access. If the capability gap is narrowing faster than the remediation gap, then the asymmetry that boards are being told to assess may not be as durable as it looks. The board-level question shifts from “do we have Mythos access?” to “how fast can we patch what public models can already find?”

Beyond security, access tiers affect competitive positioning in research productivity, product development velocity, and talent attraction. Access asymmetries become strategic asymmetries over time.

Anthropic has built a safety architecture with a defensible rationale. The dual-use risks, particularly in biology and chemistry, are real and justify access controls that go beyond what competitors offer. The architecture is not performative.

What is less clear is whether the operator can be trusted to run it. The covert degradation policy shows that even the organisation with the most formal safety architecture can implement undisclosed policies that serve competitive interests under safety language. The IPO at a valuation that could approach $800 billion creates structural pressure to expand access, narrow capability tiers, and treat safety positioning as a commercial asset. The public benefit corporation structure may not be strong enough to resist that pressure when quarterly earnings calls begin.

The industry comparison suggests Anthropic’s safety positioning is a genuine strategic choice, not just marketing. OpenAI’s uniform-safeguard model and Google DeepMind’s institutional-relationship model represent different philosophies, not weaker commitments. But the governance questions this moment has raised, from the covert policy to the sovereign access disputes to the forthcoming IPO, test whether that commitment runs deeper than the positioning.

The practical consequence is a calibrated scepticism about any single organisation’s safety claims. Governance outcomes are the metric that matters, and architecture alone cannot deliver them. For every organisation deciding whether to depend on Anthropic’s models, the question is whether the institutional guardrails hold when the pressure to compromise shifts from hypothetical to structural. Who is allowed to access frontier capability may prove just as important as who can build it.

Frequently Asked Questions

Is Anthropic’s public benefit corporation structure legally enforceable when shareholder interests conflict with safety commitments?

The PBC structure permits directors to weigh mission alongside returns but does not require them to prioritise mission over profit. A Delaware PBC can be converted to a standard corporation with shareholder approval, and no court has yet established the outer boundary of director discretion when safety restrictions directly constrain revenue. The structure is a signal of intent, not an ironclad guarantee.

How would a company apply for Mythos 5 access through Project Glasswing?

Organisations submit an application through Project Glasswing’s vetting process, which evaluates institutional security posture, intended use cases, and the applicant’s track record with frontier AI models. The EU’s weeks-long negotiation for sovereign access demonstrates that approval is not formulaic: geopolitical considerations, reciprocity arrangements, and Anthropic’s own capacity constraints all factor into the timeline and outcome.

What is the difference between Anthropic’s Responsible Scaling Policy and the Fable/Mythos two-tier deployment?

The Responsible Scaling Policy (RSP) is Anthropic’s internal governance framework that defines safety thresholds, evaluation protocols, and commitments triggered when models reach specific capability levels. The Fable/Mythos two-tier deployment is a product architecture decision that implements access restrictions based on capability domains. The RSP governs when restrictions are required; the two-tier deployment is one mechanism for enforcing them.

Why did Anthropic implement the Fable 5 degradation policy covertly rather than disclosing it as a protective measure?

Anthropic’s explanation that the policy was designed to prevent capability extraction through distillation attacks does not explain the choice of secrecy. A disclosed classifier targeting systematic querying patterns across all users would have addressed the same technical concern without creating the governance credibility problem. The covert implementation opened the door to interpretations that competitive advantage, not safety, motivated the policy’s design.

What happens if a vetted Mythos 5 user or partner organisation loses access?

Access can be revoked if an organisation violates Glasswing terms, experiences a material security breach, or engages in use cases that Anthropic determines exceed the authorised scope. There is no independent appeals mechanism. The absence of formal due process means the decision sits entirely with Anthropic, which raises the same operator-trustworthiness question the covert degradation policy highlighted: who watches the gatekeeper?

Does the Fable 5 biology and chemistry classifier ever block legitimate research queries?

Yes. The classifier operates on automated detection heuristics that cannot perfectly distinguish legitimate therapeutic research from dual-use queries because the underlying capabilities are identical. A drug design query from an academic researcher and a misuse query from a bad actor may differ only in intent, not technical form. Dyno Therapeutics’ contribution of AAV candidates for capability evaluation illustrates this overlap: the research that proves capability is also the research the classifier is designed to restrict.

How does Anthropic’s approach to dual-use risk compare to OpenAI’s Preparedness Framework?

Anthropic uses capability thresholds (CB-1, CB-2) that trigger specific access restrictions tied to model tier, creating a hard boundary between what Fable 5 and Mythos 5 users can do. OpenAI’s Preparedness Framework evaluates catastrophic risk before release and applies uniform safeguards across all users, relying on monitoring and policy enforcement rather than capability degradation. The key distinction is architecture versus governance: Anthropic builds the restriction into the product; OpenAI builds it into the process.

Is the covert degradation policy a one-off incident or does it indicate a pattern?

It is impossible to determine whether the policy was an isolated lapse or representative of undisclosed practices precisely because undisclosed practices are, by definition, invisible until discovered. The 24-hour reversal timeline is consistent with both interpretations: rapid self-correction through functioning internal governance, or rapid containment after external exposure. The existence of a second undisclosed practice would be unknowable until it surfaces, which is itself the governance problem.

What should institutional investors consider about Anthropic’s IPO given the governance questions raised?

Investors should assess whether the safety commitments that drive Anthropic’s valuation premium can survive the quarterly earnings pressure that a public listing introduces. The tension is structural: every access restriction that reinforces Anthropic’s safety brand simultaneously constrains the addressable market that revenue growth depends on. Investors betting on the safety moat should ask what governance mechanisms prevent that moat from being narrowed under shareholder pressure.

How do the dual-use risks covered in this article differ from concerns about AI alignment more broadly?

The dual-use risks addressed here are about misuse by humans who gain access to capabilities the model already possesses, not about the model developing misaligned goals. A model perfectly aligned with its operator’s intentions still creates dual-use risk if those intentions are malicious, if the operator is compromised, or if the operator lacks the expertise to foresee harm. This is why access control architecture, not just alignment research, is central to Anthropic’s safety strategy.

AUTHOR

James A. Wondrasek James A. Wondrasek

SHARE ARTICLE

Share
Copy Link

Related Articles

Need a reliable team to help achieve your software goals?

Drop us a line! We'd love to discuss your project.

Offices Dots
Offices

BUSINESS HOURS

Monday - Friday
9 AM - 9 PM (Sydney Time)
9 AM - 5 PM (Yogyakarta Time)

Monday - Friday
9 AM - 9 PM (Sydney Time)
9 AM - 5 PM (Yogyakarta Time)

Sydney

SYDNEY

55 Pyrmont Bridge Road
Pyrmont, NSW, 2009
Australia

55 Pyrmont Bridge Road, Pyrmont, NSW, 2009, Australia

+61 2-8123-0997

Yogyakarta

YOGYAKARTA

Unit A & B
Jl. Prof. Herman Yohanes No.1125, Terban, Gondokusuman, Yogyakarta,
Daerah Istimewa Yogyakarta 55223
Indonesia

Unit A & B Jl. Prof. Herman Yohanes No.1125, Yogyakarta, Daerah Istimewa Yogyakarta 55223, Indonesia

+62 274-4539660
Bandung

BANDUNG

JL. Banda No. 30
Bandung 40115
Indonesia

JL. Banda No. 30, Bandung 40115, Indonesia

+62 858-6514-9577

Subscribe to our newsletter