Insights Business| SaaS| Technology Building an AI Voice Fraud Defence Stack Without a Dedicated Security Team
Business
|
SaaS
|
Technology
Mar 5, 2026

Building an AI Voice Fraud Defence Stack Without a Dedicated Security Team

AUTHOR

James A. Wondrasek James A. Wondrasek
Graphic representation of the topic AI-Enabled Social Engineering: Voice Cloning and Synthetic Persona Scams

Voice cloning now requires as little as 3–30 seconds of audio to produce an indistinguishable replica of any executive’s voice. The FBI reported $2.77 billion in Business Email Compromise losses in 2024. In documented cases from 2024 and 2025, finance employees authorised millions in wire transfers after what appeared to be legitimate video conferences — every face and every voice AI-generated.

Security awareness training cannot protect staff against a voice that sounds exactly like their CEO. As covered in why training alone cannot protect your finance team, NCC Group‘s researchers concluded it “would just not be reasonable to expect the victims to detect the subterfuge” during live vishing exercises using AI voice cloning against real organisations.

What works — for organisations of any size — is process. This article gives you a concrete defence stack you can deploy without a dedicated security team. It’s part of the broader picture of AI-enabled social engineering threats facing your organisation. These are process controls that break the attacker’s ability to exploit urgency and authority impersonation, regardless of how convincing the voice clone is. And every single one of them can be set up by a generalist.

Why do process controls matter more than detection when voices can be perfectly cloned?

When AI can produce a voice indistinguishable from the real person, the defence has to shift. You stop asking “can I tell this is fake?” and start asking “does this request follow our verified process?”

Voice cloning defeats the human ear. NCC Group trained a working model using minutes of publicly available voice samples on consumer hardware, and that clone deceived victims in practical security assessments. Research puts deepfake voice detection accuracy as low as 38.2% — under test conditions, not during a time-pressured call from someone who sounds exactly like the CEO.

Enterprise-grade detection tools do exist. But they require dedicated security staff, significant budget, and enterprise infrastructure. That’s not realistic for a company of 50–500 without a security team.

The attacker’s real weapons are urgency and authority impersonation. A cloned CEO voice saying “I need this wire done now, we will lose the deal” creates exactly the right conditions for fraud. Attackers script calls to make hesitation feel like insubordination.

Process controls neutralise both levers. Out-of-band verification, dual authorisation, and code word protocols don’t try to identify whether a voice is synthetic — they verify the request through a channel or process the attacker cannot control. None require specialist tooling.

Attackers don’t restrict targeting by company size. The average BEC wire transfer request reached $128,980 in Q4 2024 — organisations of all sizes are viable targets if their executives have public audio available, and most do. The complete AI-enabled social engineering threat landscape covers why commodity attack infrastructure makes company size irrelevant to attacker targeting decisions.

What is out-of-band verification and how do you implement it for wire transfers?

Out-of-band verification means confirming any phone or video request involving financial actions or credential resets through a pre-established, independently verified secondary channel. Callback verification — hanging up and calling back on a known number — is one type of this broader principle. It also covers any separately established channel: a corporate directory number, a secure portal, or in-person confirmation.

The channel must be separate from the channel on which the request arrived. Never a callback to the number that called you — caller ID can be spoofed.

Why does this defeat voice cloning? Because you’re verifying the channel, not the voice. A perfect voice clone cannot simultaneously control the requester’s pre-registered mobile number or appear in person.

Implementation steps:

  1. Establish a verified contact directory. Mobile numbers and secondary contact methods for everyone authorised to request financial actions. Verified in person — not by email.

  2. Distribute via a secure internal channel. A locked intranet page, laminated copy, or secured messaging channel. Not email.

  3. Define the policy trigger. Any phone or video request involving wire transfers, payment changes, vendor banking detail modifications, or credential resets requires out-of-band verification before action.

  4. Hang up and verify independently. Contact the requester via a different channel from the verified directory. Confirm the specific request details. Proceed only after independent confirmation.

  5. Define the exception procedure. If the requester cannot be reached within 30 minutes, escalate to a second authoriser. Never proceed based solely on the original call.

A legitimate requester will accept a callback without objection. An attacker will resist it — and that resistance is itself a signal.

How does a dual authorisation protocol work for high-value transactions?

Dual authorisation means two separate individuals independently verify and approve high-value or unusual transactions before anything gets executed. Neither person can rely on the other’s verification — the second approver has to independently confirm the request through their own out-of-band channel.

The principle is straightforward: a single fraudulent call can compromise one person. Dual authorisation means the attacker has to simultaneously deceive two people through two separate verification processes. This is the four-eyes principle, and it’s standard in financial controls for exactly this reason.

Implementation steps:

  1. Define the threshold and triggers. A reasonable SMB starting point: any transfer above $10,000, any new payee, any change to existing vendor banking details, any deviation from standing payment orders.

  2. Assign approval pairs. No single person can authorise a payment above the threshold alone. Document the pairs and their pre-authorised alternates.

  3. Require independent verification from each approver. The second approver must confirm independently — not by asking the first approver “did you verify this?”

  4. Document every triggered transaction. Timestamp, both approvers’ names, verification method, and exceptions. This is your audit trail for any subsequent dispute or investigation.

  5. Define exception handling. Pre-authorised alternates step in when primary approvers are unavailable. Never reduce to single-person approval under time pressure.

Dual authorisation adds 15–30 minutes to a transaction approval. That’s deliberate. FACC Aerospace lost $58 million to CEO impersonation fraud, and the board fired both the CEO and CFO for inadequate controls. A 30-minute process delay is not comparable to that outcome.

What is a code word protocol and why can a voice clone not replicate it?

A code word protocol is a pre-established, rotating challenge phrase shared privately between specific individuals — CEO and CFO, executive and finance lead — that has to be provided before any unusual financial action gets authorised.

A voice clone cannot defeat it. A clone replicates the sound of a voice, not the content of a private agreement. The attacker doesn’t know the code word because they weren’t part of the private process that created it. No external party can access knowledge established in a private conversation.

This concept is familiar in personal safety contexts — Kaspersky recommends code words for family emergency verification. Formalising it as a business control is less common, and that gap matters.

Implementation steps:

  1. Establish code words in person or via a verified secure channel. Never by email or phone. If it can be intercepted, it doesn’t qualify.

  2. Assign code word pairs per critical relationship. CEO–CFO, CEO–Finance Lead, CTO–Help Desk Lead. Each pair has its own code word known only to those two individuals.

  3. Rotate quarterly. Rotate immediately if any party suspects compromise, or if a staff member who knew the code word leaves.

  4. Define the challenge procedure. When a financial request arrives via phone or video, the recipient asks for the current code word. No code word means no action.

  5. Integrate into onboarding. New executives and finance staff establish their code word pairs in person during role setup.

The protocol requires no software and no vendor — just a private conversation that an attacker with every public recording of the CEO’s voice still cannot replicate.

What should an IT help desk verification checklist include?

Help desks are targets for voice fraud because they hold the keys to user accounts. An MFA reset or credential change triggered by a cloned voice can compromise email, file shares, and every application linked to single sign-on. NCC Group’s vishing assessments successfully performed password resets and email address changes using real-time AI voice cloning against real organisations. This is not theoretical.

The checklist below is concrete enough to print, post at workstations, and adopt as internal policy without modification.

IT Help Desk Verification Checklist

The hardware complement: Phishing-resistant MFA (FIDO2/passkey) eliminates an entire class of vishing attack. FIDO2 keys bind cryptographically to specific domains — they cannot be verbally shared or socially engineered. CISA designates FIDO/WebAuthn as the gold standard. Deploy to finance staff, IT administrators, and C-suite first. FIDO2 keys such as YubiKey cost $25–$50. Passkeys are increasingly built into existing devices — Windows Hello, Apple Touch ID, and Face ID qualify where supported.

How does a mandatory cooling-off period stop urgency-driven fraud?

A mandatory cooling-off period imposes a minimum waiting period before executing unusual high-value transfers — even after apparent verification. The recommended duration: 30 minutes minimum for transfers above your defined threshold, two hours for new payees or changed banking details.

Urgency is the attacker’s lever. A mandatory delay removes the ability to exploit time pressure regardless of how convincing the impersonation is. The Arup case makes this concrete: the finance employee initially suspected fraud but was convinced by what appeared to be confirmation on the video call. A mandatory hold would have created a window for that suspicion to become action — and for out-of-band re-verification to happen. If re-confirmation cannot be obtained within the window, the transaction does not proceed.

Implementation steps:

  1. Define “unusual” in your payment approval policy. Remove judgement calls. An unusual transaction is: any transfer above threshold, any new payee, any change to vendor banking details, any request accompanied by urgency language.

  2. Set the cooling-off duration. Thirty minutes minimum above threshold. Two hours for new payees or changed banking details.

  3. Document and communicate the requirement. Written policy — so it can be enforced and compliance demonstrated in an investigation.

  4. Conduct out-of-band re-verification during the cooling-off period. Not optional even if the original verification appeared successful.

  5. No exceptions for urgency claims. Urgency language triggers the longer cooling-off period, not a shorter one. Document this explicitly so staff can enforce it.

How do you audit your organisation’s audio and video exposure before an attacker does?

An OSINT (Open Source Intelligence) attack surface audit is about inventorying which executives have public audio and video that could serve as voice clone training data. Think of it as the defensive version of the attacker’s reconnaissance.

Attackers collect voice training data from public sources: conference talks, podcast recordings, LinkedIn videos, earnings calls. NCC Group documented obtaining samples “often low quality or including background noise” that were still sufficient after standard audio processing. An executive with two or three public recordings has provided far more training material than an attacker needs.

Implementation steps:

  1. List all executives and senior staff with financial or operational authority. CEO, CFO, CTO, Finance Director, IT Director.

  2. Search publicly available audio and video for each person. Their name plus: “podcast,” “interview,” “conference,” “YouTube,” “LinkedIn,” “earnings call,” “webinar.”

  3. Categorise exposure level. High: multiple long-form audio sources. Medium: one or two appearances. Low: minimal or no public audio.

  4. Assign higher-verification protocols to high-exposure individuals. Require code word verification for all financial requests, not just unusual ones.

  5. Consider reducing future exposure where practical. Not all content needs to remain publicly indexed indefinitely.

  6. Repeat quarterly, or when executives take on new public-facing roles. OSINT exposure mapping tools such as Brightside AI can assist, though a manual search is sufficient for most organisations without a dedicated security team.

What should you do immediately if a deepfake fraud attempt has already succeeded?

The detailed legal, insurance, and regulatory response is covered in what you are liable for without a verification protocol. Here are the immediate actions in the first 60 minutes.

Financial recovery first. Contact the receiving bank immediately and request a wire recall. Notify your own bank’s fraud department at the same time. The FBI emphasises contacting financial institutions within 48 hours for the best chance of recovery.

Internal notification. Alert senior leadership, IT (the attacker may still have access if voice fraud compromised credentials), and legal counsel. Don’t communicate about the incident via the compromised channel — use out-of-band communication for all incident discussion.

Evidence preservation. Preserve everything: call recordings, emails, chat logs, transaction records, access logs. Document the timeline — when the request arrived, what verification steps were followed or skipped, when the fraud was discovered.

Regulatory obligations. GDPR, HIPAA, and PCI DSS may impose notification timelines from the point of discovery — engage legal counsel in the first 60 minutes. The legal and insurance stakes if these controls are absent are covered in the next article. The controls above form a process-first defence against AI-enabled social engineering threats facing your organisation.

Frequently Asked Questions

How much audio does a scammer need to clone someone’s voice?

As little as 3–30 seconds of clear audio is sufficient for current tools. NCC Group trained a usable model with minutes of publicly available samples, and quality improves with more material. Any executive with public video or podcast appearances has already provided more than enough.

Can AI really clone a voice well enough to fool a trained finance employee?

Yes. In practical assessments using real-time AI voice cloning, NCC Group found that detection “would just not be reasonable to expect.” Research shows deepfake voice detection accuracy as low as 38.2% under test conditions — in a live, time-pressured call, performance is likely worse. Process controls are more reliable than human detection.

What is the difference between out-of-band verification and callback verification?

Out-of-band verification is the broader principle: confirming a request through any pre-established, independently verified secondary channel. Callback verification — hanging up and calling back on a known number — is one specific type of out-of-band verification.

How often should we rotate code words?

Quarterly is the recommended baseline. Rotate immediately if any party suspects compromise or if a staff member who knew the code word leaves.

What is the recommended monetary threshold for dual authorisation?

A common SMB starting point: any transfer above $10,000, any new payee, or any change to existing vendor banking details. The average BEC wire transfer request reached $128,980 in Q4 2024 — thresholds below this protect against the typical attack range.

Does a cooling-off period apply to regular recurring payments?

No. It applies to unusual transactions: above threshold, involving new payees, changed banking details, or accompanied by urgency language. Regular standing orders to established payees are exempt.

What hardware MFA options exist for SMBs on a limited budget?

FIDO2 keys such as YubiKey typically cost $25–$50. Passkeys are increasingly built into existing devices — Windows Hello, Apple Touch ID, and Face ID qualify where supported. Deploy to finance staff, IT administrators, and C-suite first.

How do I convince senior leadership to adopt these controls?

Use concrete losses: $2.77 billion in BEC losses in 2024, the FACC Aerospace $58 million loss that led to both the CEO and CFO being fired. The primary investment is process change, not budget. Compare the cost of a 30-minute delay against an unrecoverable wire transfer.

What if the attacker uses a deepfake video call instead of just voice?

The same controls apply. Out-of-band verification, dual authorisation, and code words verify the request through a channel the attacker cannot control — audio or video. For video calls, add a liveness challenge: ask the caller to perform a spontaneous physical action.

Is a phone callback to the CEO’s mobile number out-of-band verification?

Yes, provided the number was verified independently — in person or via the corporate directory — and is not the number that initiated the suspicious call.

What role does cyber insurance play in AI voice fraud defence?

Some policies exclude losses where verification protocols were not followed. Implementing these controls strengthens both your defence and your insurance position. See the legal and insurance stakes if these controls are absent for detail.

Should we ban executives from appearing on podcasts or public video?

A blanket ban is impractical. Conduct the OSINT audit, assign higher-verification protocols to high-exposure executives, and ensure all staff follow the checklist regardless of who calls.

AUTHOR

James A. Wondrasek James A. Wondrasek

SHARE ARTICLE

Share
Copy Link

Related Articles

Need a reliable team to help achieve your software goals?

Drop us a line! We'd love to discuss your project.

Offices Dots
Offices

BUSINESS HOURS

Monday - Friday
9 AM - 9 PM (Sydney Time)
9 AM - 5 PM (Yogyakarta Time)

Monday - Friday
9 AM - 9 PM (Sydney Time)
9 AM - 5 PM (Yogyakarta Time)

Sydney

SYDNEY

55 Pyrmont Bridge Road
Pyrmont, NSW, 2009
Australia

55 Pyrmont Bridge Road, Pyrmont, NSW, 2009, Australia

+61 2-8123-0997

Yogyakarta

YOGYAKARTA

Unit A & B
Jl. Prof. Herman Yohanes No.1125, Terban, Gondokusuman, Yogyakarta,
Daerah Istimewa Yogyakarta 55223
Indonesia

Unit A & B Jl. Prof. Herman Yohanes No.1125, Yogyakarta, Daerah Istimewa Yogyakarta 55223, Indonesia

+62 274-4539660
Bandung

BANDUNG

JL. Banda No. 30
Bandung 40115
Indonesia

JL. Banda No. 30, Bandung 40115, Indonesia

+62 858-6514-9577

Subscribe to our newsletter