Business

SaaS

Technology

•

Oct 29, 2025

The Shared Services Dilemma – Why Internal Platforms Decay and What to Do About It

In 2014, two-thirds of active websites got hit by Heartbleed—a security vulnerability that let attackers steal passwords, encryption keys, and private data. The bug had been sitting in OpenSSL for nearly three years. Within 24 hours of going public, attackers used it to breach the Canada Revenue Agency and steal taxpayer data.

The cause? Under-resourcing. When that bug was introduced in 2011, OpenSSL had exactly one overworked, full-time developer.

Your internal platforms face exactly the same problem. That CI/CD infrastructure everyone depends on? Your shared authentication service? The DevOps tooling that enables your product organisation? They’re all decaying through the same mechanism that created Heartbleed—what economists call the tragedy of the commons.

This article is part of our comprehensive guide on game theory for technical leadership, where we explore how strategic dynamics shape technology decisions. Understanding this pattern helps you spot decay before it becomes a crisis. So let’s look at how Heartbleed happened, why your internal platforms follow the same path, and what governance models actually work.

What is the tragedy of the commons and how does it apply to internal platforms?

The tragedy of the commons is what happens when shared resources decay because no individual has an incentive to maintain them. This is one of the key game theory concepts every technical leader encounters in managing engineering teams.

The classic example is public grazing land. Each herder gains by adding more animals, but the costs—depleted grass, eroded soil—get spread across everyone. So herders keep adding animals until the resource collapses.

Your internal platforms work the same way. Every product team depends on your CI/CD infrastructure. Every team benefits when it works. But the incentive for each team to contribute to maintenance? Minimal.

There’s another concept at play here: the free rider problem. Product teams consume all the platform value—reliable builds, fast deployments, working authentication—without contributing to upkeep. The platform team carries the entire burden.

The result is technical debt accumulates because costs distribute whilst benefits accrue to individual teams. Platform engineers burn out. Reliability degrades. Eventually you hit a crisis.

There Ain’t No Such Thing As A Free Lunch—TANSTAAFL. Things always have to be paid for. With internal platforms, the hidden price is maintainer burden, technical debt, and eventual platform failure.

How did the Heartbleed bug happen and what does it teach about shared services?

OpenSSL encrypted two-thirds of active websites in 2014. But the project had only one full-time developer, Stephen Henson, who was “hustling commercial work” to pay his bills whilst maintaining infrastructure that secured global internet commerce.

The coding error was introduced in 2011. It sat there undiscovered for three years. The vulnerability let attackers steal memory contents—passwords, encryption keys, private data.

Canada Revenue Agency was breached within 24 hours. Over 91,000 vulnerable instances were still active in late 2019, years after the patch came out.

Steve Marquess, former CEO of the OpenSSL Foundation, said “there should be at least a half dozen full-time OpenSSL team members, not just one”.

After the crisis, the Linux Foundation stepped in with dedicated funding. By 2020, OpenSSL had 18 contributors.

The lesson? Reactive funding after a crisis is expensive. Proactive investment prevents disasters.

Your internal authentication service, CI/CD pipeline, or shared API gateway faces identical dynamics. Everyone depends on it. Nobody wants to maintain it. One or two people carry the burden until they burn out.

Don’t wait for your internal Heartbleed moment.

Why do shared internal services accumulate technical debt faster than product code?

Incentives are misaligned.

Product features have visible individual rewards. Ship a feature, customers use it, leadership notices. Platform maintenance has diffuse benefits. Fix a flaky test suite—credit accrues to no one.

There’s a visibility gap. Product launches get celebrated. Platform stability is invisible until it breaks. Then the platform team gets blamed.

You’ve got a prisoner’s dilemma playing out. Each product team rationally optimises locally—ship features, hit targets, get promoted. But this creates a globally suboptimal outcome where shared infrastructure degrades.

One company calculated each team’s “tax”—how many engineer hours were owed proportional to flaky tests. Before each sprint, teams saw their tax. Within a month, flaky tests were almost completely eliminated.

When one team cuts corners in a shared service, all teams suffer. The technical debt compounds faster than isolated debt would.

There’s information asymmetry too. Product teams see “works fine.” Platform teams see “held together with duct tape.”

What is the free rider problem in platform engineering teams?

The free rider problem is straightforward—individuals benefit from a resource without contributing to its maintenance.

In platform engineering it looks like this: all product teams use your CI/CD infrastructure, logging, authentication services. But contributions come only from the platform team. Product teams file issues but don’t submit pull requests.

The classic signal? “We’re too busy with features to fix the platform.” Said by the same teams that continuously file platform bug reports.

Each team makes the locally optimal choice—focus on features, hit targets, advance careers. But this creates platform decay.

James M. South tweeted that ImageSharp passed 6 million downloads but only 98 collaborators contributed over 5 years, with just 23 making more than 10 commits. The issue is sustainability.

Your platform team experiences the same burden. Unrelenting demands with no reciprocal support. Eventually they burn out.

Breaking the cycle requires governance mechanisms that align incentives. You need structure, not appeals to team spirit.

How do I know if my internal platform is suffering from tragedy of the commons?

Look for these warning signs:

Platform team constantly overwhelmed despite having users across the entire company.

Technical debt in shared services growing faster than product code debt.

Someone “volunteering” to maintain infrastructure outside their official responsibilities. This is the OpenSSL pattern.

Product teams building workarounds instead of fixing root platform issues.

Platform engineer turnover higher than product team turnover. Cognitive load results in fatigue, errors, and frustration.

Escalating complaints about platform reliability but no increase in contributions.

“Works fine from outside” versus “barely holding together” from inside. Information asymmetry is a key indicator.

Shadow IT proliferation—teams building competing solutions instead of improving the shared platform.

Three or more of these warning signs means tragedy of the commons is in effect. Five or more means you’re approaching crisis.

Catch this at the neglect stage—six to twelve months of deferred maintenance—before it becomes a crisis.

What governance models prevent internal platforms from decaying?

You need governance that matches your organisation’s scale. Three models work at different sizes:

Benevolent Dictator (up to 50 engineers): Single platform lead makes decisions. Fast and simple. This is your starting point.

Platform Guild (50-200 engineers): Representatives from product teams provide advisory input. Platform team retains authority. This is the SMB sweet spot.

Platform Council (200+ engineers): Formal governance with voting on major decisions. More process overhead. Rare for SMB.

The foundation for all three is credible commitment from leadership that platform work is valued and funded.

Gartner predicts 80% of engineering organisations will have platform engineering teams by 2026. Get ahead of this.

Funding mechanisms include dedicated headcount (one platform engineer per 10-20 product engineers), protected budget allocation (10-20% of engineering budget), or a consortium model where product teams co-fund the platform.

Start lightweight. Benevolent dictator with clear authority and protected funding. Evolve to guild as you grow.

How should platform team contributions be measured and rewarded?

Avoid feature count, lines of code, and uptime alone. Feature count encourages building unused features. Lines of code is irrelevant. Uptime can be high whilst the platform is terrible to use.

Focus on outcomes, not outputs.

Adoption metrics: Percentage of teams using the platform. Shadow IT indicators—if teams are building alternatives, your platform isn’t meeting needs.

Satisfaction metrics: Internal Net Promoter Score. Developer satisfaction surveys. If your internal users wouldn’t recommend your platform, you have a problem.

Performance metrics: DORA metrics work for platforms—deployment frequency, lead time for changes, change failure rate, mean time to recovery.

Health metrics: Technical debt trend direction. Incident frequency by root cause. Security vulnerability count.

Make invisible platform work visible. Platform contributor of the month. Public acknowledgement. “Platform ambassador” roles.

Align rewards with platform adoption and satisfaction. OKRs tied to these outcomes. Promotion criteria that recognise infrastructure work.

The key is measuring outcomes—adoption, satisfaction, productivity impact—not outputs like features shipped.

Should internal platforms be funded like products or treated as overhead?

Treat them like products.

The overhead model creates “not my problem” dynamics. There’s a risk of arbitrary budget cuts. This leads to decay.

The platform-as-product model treats internal users as customers. Your roadmap is driven by user input. Success is tied to adoption. This creates accountability.

Alternative approaches include a consortium model where product teams co-fund the platform, or embedded contributors where teams dedicate 5-10% capacity to platform maintenance.

You need credible, protected budget. Allocate a specific percentage of your technology budget to platform investment.

Start with a platform-as-product mindset. Typical ratio at SMB scale is one platform engineer per 10-20 product engineers.

How do successful companies align incentives for shared service maintenance?

Structural mechanisms work. Cultural appeals don’t.

Mandatory contribution: Product teams dedicate 5-10% time to platform work. Enforced through sprint planning. Not negotiable.

Rotation programs: Engineers rotate onto the platform team for three to six months. Builds empathy. Spreads knowledge.

Bounty system: Platform team offers “bounties” for specific improvements. Makes contribution opportunities visible.

Platform guild: Regular user input sessions. Creates a coalition for funding.

Technical debt transparency: Health dashboards. Debt registry. One approach involves calculating each team’s technical debt “tax” and making it visible.

Community building: Slack channels. Monthly newsletters. Demo days. Office hours. Build community, not just infrastructure.

The pattern is make implicit costs explicit. Make invisible work visible. Align individual recognition with collective contribution.

The prisoner’s dilemma helps us understand cooperation and competition. Each party can improve their position by defecting—shipping features instead of fixing infrastructure—but when everyone defects, the outcome is worse for all. For more on how this and other strategic dynamics shape technical decisions, see our comprehensive game theory for technical leadership guide.

The most common path to cooperation arises from repetitions of the game. That’s why transparency matters—dashboards, visible metrics, public recognition.

Punishment is easier in smaller groups. This is why platform governance works better at SMB scale. You can actually see who contributes and who free rides.

FAQ Section

What’s the difference between platform engineering and DevOps?

Platform engineering focuses on building internal developer platforms as products, treating infrastructure as user-facing capability. It’s a practice derived from DevOps principles that aims to improve developer experiences through self-service within a secure framework. DevOps is a broader cultural movement emphasising collaboration between development and operations. Platform teams often implement DevOps practices but with a product mindset.

Can small companies afford dedicated platform teams?

At 20-50 engineers, start with two to three dedicated platform engineers plus part-time contributions from product teams. The cost of not having a platform team—accumulated technical debt, productivity drag, crisis response—typically exceeds the investment in a dedicated team. Start lightweight and iterate based on developer feedback.

How long does it take for internal platforms to decay without governance?

The pattern goes like this: Healthy (active maintenance) → Neglect (six to twelve months of deferred maintenance) → Decay (twelve to twenty-four months of compounding debt) → Crisis (platform blocks productivity) → Expensive rebuild. Early intervention at the neglect stage prevents crisis.

What if product teams resist contributing to platform maintenance?

Resistance often signals misaligned incentives rather than unwillingness to contribute. Solutions include making technical debt cost visible through dashboards, tying OKRs to platform health, implementing mandatory contribution percentage, or adopting a rotation model. Structural changes prove more effective than cultural appeals alone.

Is the platform-as-product model realistic for internal tools?

Yes, and it’s recommended. Internal users are real customers with needs, constraints, and alternatives (shadow IT). Treating platforms as products creates accountability for adoption and satisfaction, prevents building unused features, and justifies investment through demonstrated value.

How do I know if I should rebuild vs fix a decayed platform?

Rebuild indicators include core architecture fundamentally broken, security vulnerabilities unfixable without major refactor, and the team has lost confidence in the codebase. Fix indicators include debt is superficial, architecture is sound, and the team understands the codebase. Consider a strangler fig pattern for incremental replacement if a rebuild is necessary.

What’s the bus factor for internal platforms and why does it matter?

Bus factor is the number of people who can maintain a system before it becomes unmaintainable if they leave. Internal platforms often have a bus factor of one—a single volunteer maintainer. Risk compounds with platform criticality. Solution is dedicate a team, document extensively, and rotate knowledge. The OpenSSL pattern—one developer maintaining infrastructure for millions of users—is exactly what you want to avoid.

How does tragedy of commons in internal platforms differ from open source?

The free riding, under-investment, and maintainer burnout dynamics are identical. Internal platforms have a closed user base, direct funding is possible, and governance is enforceable. Open source has global users, funding is harder, and governance is voluntary. Internal platforms offer more levers for solutions due to direct control over incentive structures.

What metrics indicate platform team health vs product team health?

Platform teams need adoption rate, user satisfaction, incident frequency, technical debt trends, and support response time. Product teams need feature velocity, revenue or usage metrics, and customer satisfaction. Platform teams need different metrics reflecting infrastructure nature and shared service dynamics. Evaluating platform teams using product team metrics creates misaligned expectations and incentives.

Can platform teams be too large relative to product teams?

Yes. If your platform team exceeds 20-30% of engineering headcount, you’re likely over-engineering or building unused features. Healthy ratio is one platform engineer per 10-20 product engineers at SMB scale. Monitor adoption metrics. If features are unused, reduce platform team size or refocus their efforts.

How do I justify platform investment to non-technical leadership?

Frame it as risk mitigation and a productivity multiplier. Quantify incident costs, developer productivity drag, opportunity cost of workarounds, and recruitment and retention impact. Use Heartbleed as an external example of deferred maintenance cost. Present total cost of ownership, not just team budget. Use comparisons, not raw numbers.

What happens after Heartbleed shows platform investment works?

Post-Heartbleed, OpenSSL received dedicated funding from the Linux Foundation Core Infrastructure Initiative. The project grew from one contributor to 18 by 2020. Technical debt got paid down. Security improved substantially. This proves that credible commitment and proper resourcing breaks the tragedy of the commons cycle.