Cloudflare‘s CTO apologised after their November 2025 outage took a huge chunk of the internet offline. AWS’s US-East-1 region went down for 15 hours in October. And here’s the thing – organisations not even using these platforms still went down.
This article is part of our comprehensive infrastructure outages and cloud reliability in 2025 analysis, where we explore how recent major outages reveal systemic vulnerabilities in cloud infrastructure.
You might think you’ve got this covered because you’re running multi-AZ, following best practices, and your architecture diagrams look great. But cloud concentration risk describes portfolio-level systemic exposure that exists even when individual architectures are properly designed.
With hyperscalers controlling 65%+ of the market and regulatory frameworks like DORA now mandating concentration risk management, you need to understand what this actually means. In this article we’re going to cover the conceptual foundations, market structure analysis, and frameworks to assess your portfolio exposure and communicate systemic vulnerabilities to your board.
What Is Cloud Concentration Risk?
Cloud concentration risk is portfolio-level systemic exposure created when multiple business functions depend on a limited number of infrastructure providers or regions. Unlike technical single points of failure, concentration risk describes correlated failure modes across your entire technology stack. It’s strategic risk requiring board-level oversight, not just architectural resilience planning.
Here’s the distinction that matters – architectural redundancy within one provider doesn’t eliminate concentration risk if that provider experiences foundational service failures.
You can run multi-AZ across three availability zones with perfect failover. You can have active-active architecture with zero RPO. But when AWS’s internal DNS resolution for DynamoDB service endpoints failed, it didn’t matter. For a detailed technical analysis of how these failures propagated, see our examination of the 2025 AWS and Cloudflare outages. The October 2025 outage generated 17 million Downdetector reports, hitting even well-architected systems.
Portfolio perspective is what separates concentration risk from technical redundancy. Dependencies accumulate across teams and projects, creating organisation-wide exposure you don’t see in individual system architecture reviews. When concentration creates systemic failures, they cascade.
How Does Cloud Concentration Risk Differ From Vendor Lock-In?
Lock-in is about switching costs. Concentration risk is about correlated failure exposure. You can have high vendor lock-in with low concentration risk by using multiple locked-in providers. Or you might migrate away easily but still have significant concentration risk because all your workloads sit on one platform.
Vendor lock-in describes technical and economic barriers to switching providers – proprietary APIs, data egress costs, application dependencies. Cloud concentration risk describes the systemic exposure from depending on few providers, regardless of your ability to switch.
Here’s a concrete example. A healthcare organisation built a patient management system using AWS-specific services over three years. When AWS increased pricing 40%, they discovered migration would require $8.5 million and 18 months. That’s vendor lock-in.
Concentration risk is different. Your organisation might use only portable, standard APIs with zero proprietary services. You could switch providers in a month. But if 80% of your business functions depend on a single provider and their control plane fails, you’re still exposed to concentration risk.
The Venn diagram overlap matters though. Vendor lock-in reinforces concentration by making diversification expensive and time-consuming. This is why 71% of surveyed businesses claimed vendor lock-in risks would deter them from adopting more cloud services.
Abstraction layers like Kubernetes or Terraform help with both problems. They reduce provider-specific dependencies, making workloads more portable. But deploying abstraction layers still requires actually diversifying providers to address concentration risk. For detailed guidance on implementing these approaches, see our guide on multi-cloud architecture strategies and resilience patterns.
Why Is Hyperscaler Market Concentration a Problem?
AWS, Microsoft Azure, and Google Cloud Platform control approximately 65% of the global IaaS/PaaS market. That’s oligopolistic structure where three vendors control two-thirds of infrastructure businesses depend on.
For CDNs, it’s worse. Cloudflare and Amazon alone host over 30% of popular domains. Customer negotiating power drops when alternatives are limited. Roughly 86% of enterprises use multi-cloud precisely to avoid this.
The economics will continue driving businesses to adopt the largest platforms, but regulations have made customers aware of concentration risk downsides.
Systemic risk compounds these negotiating power issues. When providers fail, correlated exposure means entire industries go down simultaneously. 89% of top websites depend on at least one third-party DNS, CDN, or Certificate Authority. Top-three providers in each category can impact 50-70% of all sites.
Regulators have noticed. The UK’s Financial Conduct Authority and European Banking Authority now classify major cloud providers as critical third parties subject to operational resilience requirements. DORA – the EU’s Digital Operational Resilience Act effective January 17, 2025 – mandates dependency assessment, contractual risk controls, and exit strategies.
What Makes US-East-1 a Single Point of Failure?
AWS’s US-East-1 region in Northern Virginia hosts an estimated 30-40% of global AWS workloads. That concentration alone creates risk.
But it gets worse. US-East-1 also hosts core services and global control planes that other AWS regions depend on. When US-East-1 fails, it affects AWS operations globally.
October 19, 2025 proved this. Internal network failures cascaded to DynamoDB, which then cascaded to IAM authentication, preventing teams from logging into the AWS console to apply fixes. The disruption persisted 15-16 hours.
Here’s the analogy that explains why multi-AZ doesn’t help – “AZs are like rooms in a house. If one room floods, you move to another. But if the entire house floods, every room is underwater.”
Even organisations running “global” apps often anchor identity or metadata flows in US-East-1. When that region fails, impacts propagate worldwide regardless of where workloads physically run.
Mitigation requires multi-region deployment – active-active, active-passive, or pilot light. But here’s what Netflix proved during AWS outages – reliability was engineered into their DNA. Architecting for failure matters more than avoiding specific providers.
How Does the Shared Responsibility Model Relate to Concentration Risk?
The shared responsibility model defines boundaries where cloud providers manage infrastructure security and customers manage application and data security. During foundational service failures, customers “did everything right” but still experienced outages due to provider infrastructure failures. You lack control over foundational dependencies.
AWS directs customers to the shared responsibility model for service availability promises. But when foundational services like DNS fail, even well-architected applications can become unstable.
SLA limitations make this worse. Standard AWS SLA penalties typically offer 10% credit for uptime below 99.99%, 25% credit for uptime below 99.0%, 100% credit for uptime below 95.0%. Remember what cloud providers openly acknowledge – many cloud providers measure their available uptime in terms of ‘6 nines’ (99.9999%) uptime, not 100%.
If your $50,000/month customer received a 25% credit – $12,500 – but experienced $150,000 in business losses, the SLA covered only 8% of actual damage. A mid-sized e-commerce site processing $100,000 daily would have lost approximately $62,500 in revenue from the October 2025 AWS outage.
Accountability gaps emerge because customers followed provider recommendations but still went down. Shared services like IAM and CloudWatch created single points of failure across regions even for multi-region deployments.
The remedy? Update agreements to assign accountability during disruptions and negotiate compensation for downtime.
What Is Systemic Risk in Cloud Infrastructure?
Systemic risk describes cascading failures across interconnected systems where a single failure propagates beyond individual organisations to affect entire industries or economic sectors. Think “too big to fail” banks – systemically important institutions whose failure would cascade across the economy.
In cloud context, hyperscaler concentration creates systemic risk. One provider’s outage can impact vast portions of the internet including organisations not directly using that provider.
Cloudflare handles roughly 28% of global HTTP/HTTPS traffic. When Cloudflare fails, websites depending on it become unreachable even if underlying hosting infrastructure remains operational. Cloudflare’s three-hour November 2025 disruption could cost businesses $250 million to $300 million according to Forrester‘s estimates. During the November 18, 2025 outage, approximately 20% of internet traffic was disrupted for nearly 6 hours.
Resilience isn’t free. Organisations bear the cost of protecting against systemic risk.
Cloud platforms are systemic infrastructure, characterised by significant blast radius when single point of logical failure emerges. SaaS applications, APIs, authentication providers and data-integration tools often sit on AWS. When one layer of that chain fails, it cascades quickly across dependent systems.
CyberCube characterised the AWS US-East-1 outage as a moderate cyber (re)insurance event potentially triggering contingent business interruption claims. They advise (re)insurers to utilise Single-Point-of-Failure Intelligence platforms to assess regional cloud concentration risk.
Regulatory response treats systemic infrastructure differently from ordinary vendors. DORA introduces EU-level oversight of critical ICT third-party providers.
How Do I Assess My Organisation’s Cloud Concentration Risk?
Start with dependency mapping. Document all cloud providers, regions, services, and foundational infrastructure your organisation relies on. Identify every application, data flow and third-party service that touches cloud infrastructure, directly or indirectly.
Standard vendor assessments and SLAs rarely show the picture you need. You need to map dependencies beyond first-tier vendors to identify sub-vendors and understand the dependency chain.
Classify workloads as business-operations-stop or merely important. This determines where concentration mitigation investment matters most. Many CIOs focus contingency plans on hardware failure, cyberattacks, or data centre loss, yet often overlook systemic vulnerabilities introduced by single-region reliance or untested failover strategies.
Quantify exposure using frameworks like CyberCube’s approach. Estimate probability of provider-level outage. Assess percentage of functions affected. Calculate hourly revenue and operational impact. Model cascading effects. This produces expected annual loss figures that boards and CFOs can compare against mitigation investment costs. For detailed methodologies on calculating these costs, see our guide on calculating the true cost of cloud outages and downtime.
Running workloads across multiple AWS regions reduces regional concentration but doesn’t eliminate provider-level systemic risk when foundational services like control plane, IAM, or DNS fail.
Communicate to boards using risk language executives understand. C-suite, board, and possibly investors should be made aware of risks and costs associated with using single cloud provider versus multiple cloud providers.
Mandate multi-region or hybrid-cloud strategies complemented by regular failover testing.
What Are Multi-Cloud and Hybrid Cloud Strategies?
Multi-cloud distributes workloads across AWS, Azure, and GCP to reduce vendor concentration risk. Hybrid cloud combines public cloud with private infrastructure for data sovereignty or regulatory requirements.
Multi-cloud addresses vendor concentration through provider diversification. Hybrid addresses public versus private mix. Both introduce operational complexity and demand specialised skills. For a complete overview of cloud resilience strategies, see our infrastructure outages and cloud reliability in 2025 guide.
Teams within organisations often have different requirements, workloads, and preferences naturally gravitating towards specific cloud platforms. Multi-cloud strategy can unify this fragmented landscape. By spreading applications and data across AWS, Azure, Google Cloud and other providers, organisations eliminate single points of failure.
Multi-cloud architecture patterns include active-active, active-passive, and workload placement strategies. Active-active runs workloads simultaneously across multiple providers with real-time synchronisation. It provides highest resilience but greatest cost and complexity.
Most organisations don’t need active-active everywhere. Reserve it for mission-critical systems where RTO must be near-zero.
Active-passive maintains primary cloud handling traffic with secondary on standby for faster recovery than single-provider scenarios. Cloud bursting expands capacity during demand spikes.
Abstraction layers matter. Kubernetes and Terraform reduce provider-specific dependencies making workloads more portable across clouds. Kubernetes orchestration with cloud-agnostic service meshes enables consistent multi-cloud control planes.
Cost considerations include operational complexity, data egress fees, and multi-platform tooling versus concentration risk mitigation value. Multiple providers increase maintenance tasks – better to automate them.
Managing multiple provider APIs requires specialised expertise. But the most resilient organisations recognise that multi-cloud is the way forward.
FAQ
What is the difference between concentration risk and diversification?
Concentration risk describes the exposure created by dependency on few providers. Diversification is the mitigation strategy spreading dependencies across multiple providers or regions reducing correlated failure modes.
Can I have concentration risk even with multiple availability zones?
Yes. Availability zones provide redundancy within a single cloud provider’s region but don’t mitigate concentration risk to that provider’s foundational services. If the provider experiences control plane, authentication, or DNS failures affecting all zones, you still have concentration risk.
Why do Cloudflare outages affect businesses not using Cloudflare?
Cloudflare handles roughly 28% of global HTTP/HTTPS traffic. When Cloudflare fails, websites depending on it become unreachable to end users even if underlying hosting infrastructure remains operational. This demonstrates CDN-layer concentration risk.
Is multi-cloud too expensive for mid-size organisations?
Multi-cloud introduces operational complexity and skill requirements, but cost depends on implementation approach. Selective multi-cloud for certain workloads, active-passive rather than active-active, and abstraction layer investments can provide concentration risk mitigation proportionate to mid-size organisation budgets.
What is DORA and why does it matter for concentration risk?
The Digital Operational Resilience Act (DORA) is EU regulation effective January 17, 2025 requiring financial sector entities to manage concentration risk from providers explicitly including cloud hyperscalers. DORA mandates dependency assessment, contractual risk controls, and exit strategies.
How does vendor lock-in make concentration risk worse?
Vendor lock-in creates switching barriers like proprietary APIs, data egress costs, and application dependencies that trap organisations in concentrated provider relationships. Lock-in makes diversification expensive and time-consuming, reducing organisations’ ability to respond to systemic risk.
Should I prioritise multi-cloud or better SLAs with my current provider?
Both address different aspects. Multi-cloud reduces concentration risk through diversification. Enhanced SLAs improve accountability and compensation for failures. Regulated industries may need both. If single provider outage causes business-stopping impact, diversification may matter more than SLA credits that only cover 8% of losses.
What is active-active architecture and do I need it?
Active-active architecture runs workloads simultaneously across multiple providers with real-time synchronisation providing highest resilience but greatest cost and complexity. Most organisations don’t need active-active everywhere – reserve it for systems where RTO must be near-zero.
How do I measure concentration risk in financial terms?
Use quantitative risk frameworks like CyberCube’s approach – estimate probability of provider-level outage, assess percentage of functions affected, calculate hourly revenue and operational impact, model cascading effects. This produces expected annual loss figures boards and CFOs can compare against mitigation investment costs.
What are abstraction layers and why do they matter for concentration risk?
Abstraction layers like Kubernetes and Terraform reduce provider-specific dependencies making workloads more portable across clouds. They mitigate vendor lock-in easing multi-cloud implementation and providing exit options if concentration risk assessment demands provider diversification. Because Kubernetes is open source and supported by all major cloud vendors, it protects against vendor lock-in.
Can I reduce concentration risk without adopting multi-cloud?
Partial mitigation is possible. Use multiple regions within one provider, implement robust disaster recovery, negotiate enhanced SLAs with accountability clauses. However these don’t eliminate provider-level systemic risk. Foundational service failures like control plane or DNS can still affect all regions. True concentration risk mitigation requires provider diversification. For more information on all available resilience strategies, see our cloud reliability guide.
What is the “too big to fail” analogy for cloud providers?
Financial sector designation for systemically important institutions whose failure would cascade across the economy. Hyperscalers now receive similar regulatory treatment through DORA’s framework because market concentration makes outages systemically impactful beyond individual customer relationships. This justifies enhanced oversight and resilience requirements.