Business

SaaS

Technology

•

Oct 29, 2025

The Interview Paradox – Why Hiring Is a Prisoner’s Dilemma and How Work Samples Help

Why do candidates spend hundreds of hours practising algorithmic problems they’ll never use on the job? Why does everyone prep for LeetCode when actual development work looks nothing like inverting binary trees on a whiteboard?

The answer lies in game theory. Both sides optimise for interview performance instead of job performance because that’s the rational move given the structural incentives. This article is part of our comprehensive guide on game theory for technical leadership, where we explore how strategic frameworks help CTOs navigate complex decisions.

This creates a prisoner’s dilemma where individually smart behaviour produces collectively poor outcomes. Candidates oversell their abilities. Companies oversell their culture. Both know the other is polishing their pitch, but neither can afford to stop.

Understanding the dynamics—information asymmetry, signalling games, costly signals—reveals why hiring is structurally hard. More importantly, it shows what actually works: work samples that replace indirect signals with direct observation.

Why Do Traditional Technical Interviews Fail to Predict Job Performance?

Traditional interviews optimise for talking about work, not doing work. That’s the core problem.

Schmidt & Hunter found that experience and education have correlation coefficients of only 0.18 and 0.10 with job performance respectively—correlations they describe as “unlikely to be useful.” Unstructured interviews perform worse at predicting performance than simply giving interviewers background information alone.

Interviews measure verbal and social skills under artificial conditions. Candidates who excel at articulating experience may struggle with actual work. Strong performers might freeze in high-pressure settings.

This applies to all interview formats—behavioural, technical discussions, whiteboard coding. Both parties act rationally given the constraints they face. The problem is information asymmetry: neither side can directly observe what they need to know.

Here’s a telling statistic: 73% of companies now prioritise practical coding assessments over traditional formats. Why? Because they’ve learned that interview skills and job skills are different things.

Scott Highhouse, an industrial-organisational psychologist, calls this the “myth of expertise—the belief that experienced professionals can make subjective hiring decisions that are more effective than those guided by impersonal selection methods”. We think we’re good at reading people. We’re not.

What Is the Prisoner’s Dilemma in Hiring and How Does It Affect Both Parties?

The prisoner’s dilemma is when two rational actors pursuing self-interest produce a suboptimal outcome for both. As explored in our strategic dynamics framework, this game theory concept applies directly to hiring: candidates oversell their abilities, companies oversell the role and culture.

If you’re honest about weaknesses while the company oversells the role, you lose the opportunity. If the company is honest about challenges while you oversell skills, they lose talent. The equilibrium: both parties polish their pitch instead of showing reality.

The result is poor matches, disappointed candidates, frustrated companies, high turnover. This represents a rational response to structural incentives rather than deliberate dishonesty.

Think about pricing between Coca-Cola and Pepsi. Both companies would be better off charging high prices—that’s cooperation—but the low-price strategy is each firm’s dominant strategy. They’d prefer mutual cooperation but can’t trust the other to cooperate, so they defect.

The same logic applies to hiring. People are forced into defection by the logic of the setup, not because of bad intention—a misjudged attempt at cooperation will cost them more.

You can’t solve this by hiring people with “hero mentality” or more “skill.” It’s structural, not about individual capability.

How Does Information Asymmetry Create Problems in Technical Hiring?

Information asymmetry exists when the signaller has better information regarding their own abilities or productivity than the receiver. The receiver must infer quality from observable signals rather than direct evidence.

The company doesn’t know your true technical ability, work habits, or reliability. You don’t know real job responsibilities, team dynamics, or actual culture.

Both gaps create space for misrepresentation. Companies rely on proxies like degrees, credentials, and interview performance because they can’t observe actual work. You rely on company reputation and marketing because you can’t observe daily reality.

The larger the information gap, the more both parties depend on unreliable signals.

Michael Spence’s work in the 1970s provided the foundation for analysing how these signals affect hiring practices and wage determination. His insight was that when direct observation is impossible, markets develop signalling mechanisms—but those mechanisms are imperfect substitutes for the real thing.

Pair programming interviews reduce information asymmetry by evaluating skills closest to actual jobs. That’s why they work better than talking about code in a conference room.

What Is Signalling Theory and Why Does It Matter for Technical Hiring?

When direct observation is impossible, parties use costly signals to convey information, and signal credibility depends on cost to fake. Signalling games, a core concept in understanding strategic dynamics in technical decisions, help explain why certain hiring practices persist despite poor outcomes. Harder to produce means more credible.

A university degree is hard to fake and signals persistence. A GitHub portfolio demonstrates actual code. Certifications signal specific knowledge but are easier to obtain without deep expertise.

The problem: signal cost doesn’t always correlate with job relevance. A PhD may be costly but irrelevant for web development. This creates a signalling arms race where candidates invest in credentials regardless of whether they relate to actual work.

For a signal to be credible, it should satisfy a differential cost structure where high-productivity workers find it less costly than low-productivity workers.

This is why education acts as an initial filtering mechanism—it screens out a portion of applicants and allows focus on a smaller pool with verified capabilities. But verification of capability and actual capability are different things.

The sheepskin effect illustrates this: the twelfth year of school pays more than years nine, ten, and eleven combined; the final year of college pays more than twice as much as the first three years combined. Graduation signals conformity and that you take norms seriously—students know this matters, which is why they tend not to quit after year eleven or the second last year of college. The credential matters more than the learning.

Resume claims are cheap talk. Anyone can write “expert in Python.” That’s why nobody trusts resumes without corroboration.

Why Do Candidates Optimise for Interviews Instead of Job Performance?

Candidates spend hundreds of hours practising algorithmic problems that rarely appear in actual work, and a burgeoning industry of AlgoExpert, Interview Kickstart, Coderbyte, HackerRank, and LeetCode has sprung up around interview prep.

This is rational. If algorithmic tests are the gate, studying algorithms makes sense. LeetCode-style interviews are distinct enough from actual engineering work that seasoned practitioners study for them every time they change jobs.

Individual rationality creates collective irrationality. Someone who spent days doing all problems on LeetCode might have seen the precise exercise you’re giving them, or at least temporarily refreshed on DFS, BFS, merge-sorting, heaps—you favour these people, which only tells you they’re motivated, not if they’ll perform.

The disconnect is that interview performance measures preparation and pattern recognition, not problem-solving on real work. Skills that actually matter—reading code, collaborating, iterating—get ignored.

Signalling arms races are socially inefficient: if the order remains the same but everyone invests more in the signal, we’ve burnt resources for no gain. That’s exactly what’s happening with interview prep.

There’s even a gatekeeping element: “LeetCode-style interviews don’t prove you can do the work of the job, they prove you’re part of the club. We all had to study this rubbish so you do too”.

Both candidates and companies recognise the inefficiency but can’t escape individually. That’s the prisoner’s dilemma in action.

Which Better Predicts Job Success: LeetCode Performance or Work Samples?

Work sample tests, cognitive ability tests, and structured interviews are the best predictors of overall job performance according to Schmidt and Hunter’s meta-analysis. In the 2022 Sackett meta-analysis, structured interviews were found most predictive of overall job performance, followed by job knowledge tests.

Correlation with job performance: work samples high, structured interviews medium, algorithmic tests low-medium, unstructured interviews low.

Why work samples work: they reduce information asymmetry by simulating actual work. Instead of inferring capability from signals, you observe it directly.

LeetCode persists despite weak prediction because it’s cheap to administer at scale for initial screening. For screening 500 candidates to 20, LeetCode makes sense. For choosing between three finalists, work samples make sense.

Schmidt & Hunter summarised 85 years of research in personnel selection, studying the validity of 19 selection procedures for predicting job performance. Their conclusion: “No other characteristic of personnel measure is as important as predictive validity—economic gains from increasing validity can amount to literally millions of dollars over time”.

Work samples reveal what interviews don’t: debugging skills, code reading, handling ambiguity, collaboration.

What Makes a Signal “Costly” and Why Does That Matter for Hiring?

A costly signal requires time, effort, or resources to produce, making it expensive to fake—this matters because cost-to-fake determines credibility.

Resume claims: anyone can write “expert in Python.” Minimal cost means minimal credibility.

LeetCode: hundreds of hours demonstrates dedication but for a narrow skill. It signals persistence, not job competence.

GitHub portfolios: years of public work are hard to fake. But not all development work can be open source.

Advanced degrees: very high cost signals persistence. But the wage premium associated with degrees persists even after controlling for factors such as innate ability, suggesting it’s partly about the signal rather than purely the learning.

Paid work samples: highest cost for both parties, nearly impossible to fake. They directly demonstrate job performance.

Ask “How much time to fake this?” and “How relevant to actual work?” High cost alone isn’t sufficient—it must also be relevant to the job.

When markets are saturated with signals—everyone obtains an advanced degree—the value of that particular signal diminishes. This drives the arms race higher.

How Do Work Samples Reduce the Prisoner’s Dilemma in Hiring?

Work samples replace indirect signals with direct observation. This dramatically reduces information asymmetry: both parties see a realistic preview.

You experience real tasks, team dynamics, code quality. The company observes work output, collaboration style, debugging approach. It’s hard to misrepresent when performing actual work.

During pair programming collaborative exercises, recruiters and hiring managers can focus on the concrete contribution a candidate will bring to the team. This breaks the prisoner’s dilemma because the incentive shifts from signalling to demonstrating.

Work samples provide a rewarding experience where candidates showcase skills in action and get feedback in real time. Candidates self-select based on actual experience, not polished pitch.

Types sit on a cost-fidelity spectrum. Pair programming: a few hours of collaborative work. Take-home projects: 4-8 hours on realistic tasks. Paid trial periods: 1-4 weeks of paid contract work.

Problem solving with code formats emphasise behavioural correctness, code organisation, language fluency, speed of execution, testing—not algorithmic complexity. Feature implementation interviews give candidates access to an existing codebase and a spec for a new feature, evaluating code comprehension, ability to get oriented, product sense.

Why work samples reduce post-hire disappointment: both parties decided with better information.

What Are the Most Credible Signals for Evaluating Technical Candidates?

Evaluate signals on two dimensions: cost-to-fake for credibility and job-relevance for usefulness.

High credibility, high relevance: sustained GitHub contributions, portfolio of shipped projects, paid trial performance.

High credibility, variable relevance: advanced degrees, certifications. A PhD might be credible but irrelevant for most development.

Medium credibility: structured interviews, LeetCode, reference checks. Some signal but not primary.

Low credibility: resume claims, unstructured interviews, “culture fit” assessments, LinkedIn endorsements. Cheap to fake.

Most credible: work samples—take-homes, trials, pair programming. Verification is needed because portfolio claims face the “who actually did this?” problem.

Absence of signal doesn’t equal negative signal: not all good developers have public GitHub profiles or write blog posts—many excellent developers work on proprietary code. Don’t penalise people for not having portfolios.

“Most credible” doesn’t mean “most accessible.” Paid trials are best but expensive. Use multiple medium-credibility signals when you can’t invest in high-fidelity work samples.

Culture Fit vs Values Alignment: What’s the Difference and Why Does It Matter?

Cultural fit is a vague term, often based on gut instinct—the biggest problem is it’s far more common to use it to NOT hire someone than to hire.

Culture fit often involves euphemisms for justifying prejudice or bias—usually a sense that the person doesn’t seem “like us,” like they won’t party well or play well.

When interviewers said they “clicked” or “had chemistry” with a candidate, they often meant they shared similar backgrounds—played the same sports, went to the same graduate school, vacationed in the same spot. That’s homophily: preference for people like ourselves.

What you’re going to get is a copy of your existing employees—in many instances it is a form of discrimination. Research shows culture fit assessments correlate with demographic similarity, not job performance.

Values alignment is different. It’s explicit assessment against documented values like bias for action, customer focus, craftsmanship. Values can be defined, measured, defended. Culture fit is subjective.

Values alignment shows in work samples. How did the candidate handle trade-offs? What did they prioritise? These reveal values through action.

Culture fit increases information asymmetry because “fit” is undefined. Values alignment reduces it by establishing clear criteria.

Lauren Rivera, associate professor at Kellogg School of Management, notes that “in many organisations, fit has gone rogue”. The solution: “The only way culture in the workplace is effective is if there are sets of values that help the company achieve its strategy—when there is thoughtfulness around what values are and you tie that to hiring, then you have best hiring practices”.

Define 3-5 core values explicitly. Test candidates to see whether they demonstrate those values: if you want employees to demonstrate fun, give candidates a scenario with a disgruntled customer and ask what they’d do.

FAQ Section

Can I completely eliminate hiring risk with work samples?

No, work samples dramatically reduce information asymmetry but can’t eliminate uncertainty entirely. Even paid trials only show performance in specific contexts over limited time. Work samples are the best available method for predicting job performance, but hiring always involves some irreducible uncertainty about long-term fit, growth trajectory, and how people respond to changing conditions.

How long should a take-home project be?

4-8 hours of candidate time is the sweet spot. Shorter projects don’t provide enough signal; longer projects create unfair burden. Candidates with family responsibilities can’t invest 20 hours. Always compensate candidates for time if the project exceeds 4 hours. Ensure the project is realistic—not an algorithmic puzzle—and relevant to actual job responsibilities.

Is GitHub contribution history a reliable signal?

Presence of sustained, relevant GitHub contributions is a strong positive signal because it’s high cost to fake. However, absence is not a negative signal. Many excellent developers work on proprietary code, contribute to internal tools, or have privacy concerns. Treat GitHub as “nice to have, not required” and verify the candidate actually authored the claimed work.

Should I stop using LeetCode screening entirely?

LeetCode serves a purpose for initial screening at scale—filtering 500 applicants to 20. The problem is over-relying on it for final decisions. Use algorithmic tests as a cheap initial filter, then invest in work samples for finalist candidates. Don’t make LeetCode performance the primary decision factor for senior roles where it has minimal job relevance.

What if candidates refuse work samples because they’re too time-intensive?

This is valuable information about candidate interest level and constraints. Offer compensation for time, especially for take-homes exceeding 4 hours. Consider shorter work samples like 2-hour pair programming versus an 8-hour project. If candidates still decline, they may have time constraints—which is valid—or limited genuine interest in the role, which is also valuable to know. Top candidates are often willing to invest in work samples for roles they’re excited about.

How do I assess values alignment without enabling bias?

Define values explicitly with concrete behavioural indicators. Example: “Bias for action” means “ships working version and iterates versus perfecting before release.” Ask structured questions with rubrics: “Tell me about a time you had to choose between shipping something imperfect or delaying to improve quality. What did you decide and why?” Look for evidence in work samples showing values in action. Ensure multiple evaluators score independently against the same rubric.

Can small companies afford paid trial periods?

Paid trials are expensive—4 weeks of contractor wages—but cheaper than a bad hire: salary plus opportunity cost plus team disruption. Start with shorter trials of 1-2 weeks. Frame them as consulting contracts. Even unsuccessful trials provide value through contractor work completed. For resource-constrained companies, use paid trials for senior roles where the cost of wrong hire is highest; use shorter work samples like pair programming or take-homes for other roles.

What about reference checks – are they useful signals?

Reference checks are weak signals due to selection bias—candidates choose references unlikely to be critical—and liability concerns that make references reluctant to be candid for legal reasons. Treat them as verification mechanisms: did the candidate actually work there? What was their role? They’re better than nothing but much worse than work samples. Consider back-channel references through mutual connections for more honest assessment, if done ethically.

How do I handle candidates who perform well in interviews but poorly in work samples?

This is the work sample working as intended, revealing that interview performance doesn’t predict job performance. Trust the work sample. Interview skills—verbal fluency, confidence, storytelling—are different from job skills like code quality, debugging, and collaboration. Work samples provide far better prediction. Consider that strong interviewees may excel at talking about work rather than doing it.

Does this game theory framework apply to non-technical hiring?

Yes, the core dynamics—prisoner’s dilemma, information asymmetry, signalling—apply to all hiring contexts. Work samples look different: writing samples for content roles, design tasks for designers, sales simulations for sales roles. But the principle is the same: reduce information asymmetry through direct observation rather than indirect signals. The specifics vary by role, but the game theory structure is universal.