Insights Business| SaaS| Technology The Business Functions Most at Risk From AI Voice Phishing Attacks
Business
|
SaaS
|
Technology
Mar 5, 2026

The Business Functions Most at Risk From AI Voice Phishing Attacks

AUTHOR

James A. Wondrasek James A. Wondrasek
Graphic representation of the topic AI voice phishing attacks targeting business functions

AI voice cloning has turned vishing from a low-quality phone scam into an attack vector that neither voice recognition nor caller ID checks can reliably stop. Finance teams, IT help desks, and executive functions each face their own attack chains, their own mechanics, and their own outcomes.

CrowdStrike documented a 442% surge in vishing attacks in H2 2024. The FBI’s IC3 recorded $2.77 billion in Business Email Compromise losses across 21,442 complaints in 2024. The three highest-value documented AI fraud cases — Arup in Hong Kong ($25.6M), a Singapore multinational ($499K), and a UK energy company ($243K) — all targeted finance functions using voice or video impersonation.

This is part of the broader threat landscape of AI-enabled social engineering that’s reshaping enterprise security. If you want to understand how voice cloning actually works, read that first. This article maps each high-risk business function to its specific attack chain and what attackers need to pull it off.

What is vishing and why is a phone call more dangerous than a phishing email?

Vishing (voice phishing) is a social engineering attack over the phone where an attacker impersonates someone trusted to extract information, authorise transactions, or get into your systems.

A phone call is more dangerous than an email because it happens in real time. Email training teaches people to pause, check, and verify. A voice call removes that buffer entirely. The attacker adapts as the conversation goes, escalates urgency, and exploits the reflex to comply when someone in authority is on the line.

Two traditional verification heuristics have both been invalidated at the same time:

  1. Voice recognition — AI voice cloning requires as little as 3–30 seconds of clean audio. That audio is publicly available for most executives: earnings calls, conference presentations, LinkedIn audio posts. No hacking required.
  2. Caller ID verification — Caller ID spoofing uses VoIP infrastructure to display a trusted internal number or the executive’s registered mobile. The number looks right. The voice sounds right. Both fail simultaneously.

Vishing is growing faster than smishing because there’s a live attacker adapting in real time, not a static message. Cisco Talos reported vishing accounted for over 60% of phishing-related IR engagements in Q1 2025. The FBI issued a specific warning in May 2025 about AI voice messages combined with smishing as a multi-stage attack vector.

Vishing also operates outside the channels most security tools monitor. Email and SMS phishing leave digital artefacts. A vishing call leaves a phone record — and whatever the human on the other end decides to do next.

Why are finance teams the primary target for AI voice phishing?

Finance teams are the primary target because they have direct authority to execute wire transfers. One successful vishing call can route millions into a criminal account — no malware, no network intrusion required.

The finance function attack chain:

  1. Reconnaissance — attacker identifies CFO or finance staff via LinkedIn, harvests voice samples from earnings calls or investor presentations
  2. Target identification — upcoming payment event located: vendor invoice, M&A transaction, payroll cycle
  3. Caller ID spoofing — call appears to come from an internal number or trusted executive’s mobile
  4. Voice-cloned call — CEO or CFO’s cloned voice creates urgency via pretext: confidential acquisition, regulatory deadline, fraud alert
  5. Wire transfer instruction — finance employee directed to execute transfer to attacker-controlled account
  6. Fund movement — transfer completes before fraud is discovered

Urgency culture creates a structural vulnerability. Finance operations run on deadlines — month-end close, deal closings, regulatory reporting. A call about a confidential acquisition requiring an immediate transfer doesn’t sound unusual in that environment. Pretexting — fabricating a scenario to manipulate a target — accounts for 27% of all social engineering breaches per the Verizon DBIR. Voice cloning makes the fabricated scenario audibly indistinguishable from a legitimate request.

Finance staff are also reluctant to push back on the CFO or CEO. Attackers design pretexts that make hesitation feel like insubordination. Wire-transfer BEC scams increased 33% in Q2 2025 versus Q1.

What are the biggest documented losses from AI voice fraud?

Three cases establish the scale. Each targeted finance. Each used voice or video impersonation. Each succeeded.

Arup, Hong Kong, 2024 — $25.6M–$39M

A finance employee was invited to a video conference that appeared to include the CFO and several colleagues. Every participant was synthetic. The employee received instructions to execute wire transfers across 15 transactions totalling HK$200 million. Some reports put total losses at $39 million. It’s the largest documented AI voice and video fraud case on record.

Singapore, March 2025 — $499,000

A finance director joined a Zoom call with what appeared to be senior leadership. Every face and voice was AI-generated from publicly available media. The finance director authorised a $499,000 transfer.

UK energy company, 2019 — $243,000

The earliest major documented AI voice fraud case. The UK subsidiary CEO received a call matching the German parent company CEO’s voice — correct accent, speech patterns, mannerisms — and transferred $243,000 to a Hungarian supplier. The voice had been cloned from publicly available conference recordings. This case established that AI voice fraud was operationally viable as early as 2019.

The pattern across all three is the same: finance targeted, urgency or confidentiality as pretext, senior executive impersonated, wire transfer executed before verification could happen. Traditional methods failed — including video calls. Research explains why: humans correctly identify high-quality deepfake video only 24.5% of the time, and deepfake voice detection accuracy is as low as 38.2%.

What happens when an IT help desk gets tricked by a cloned voice?

The IT help desk is the most underappreciated attack surface in enterprise security. A single successful MFA reset call gives the attacker access to every SSO-connected application in the organisation: email, CRM, file storage, HR systems, ERP. The lot.

The MFA reset attack chain:

  1. Attacker identifies help desk contact via company directory or LinkedIn
  2. Calls using a cloned executive voice: “I’m travelling and my phone was stolen, I need my MFA reset immediately”
  3. Help desk employee performs verbal identity check — the voice matches — and complies
  4. MFA credentials are reset; attacker accesses the target account
  5. Attacker pivots through SSO to email, CRM, file storage
  6. Lateral movement: compromised email enables BEC against finance; CRM data enables further impersonation

This is not theoretical. Scattered Spider and ShinyHunters ran professionalised help desk vishing campaigns targeting 760+ organisations, charging $500–$1,000 per call through a vishing-as-a-service model. Confirmed victims include Google, Cisco, Wynn Resorts (800,000+ employee records), CarGurus (12.5 million records), and Harvard University.

What the attacker needs: a voice sample (publicly available), the help desk number (on the website), and knowledge of which account to target. SMBs at 50–500 employees are particularly exposed — formal phone-based verification protocols are rare at this scale, and voice recognition is often the only control.

The threat actors driving these campaigns represent a professionalised criminal supply chain purpose-built for help desk exploitation.

How does a BEC payment diversion attack unfold step by step?

BEC payment diversion is different from CEO fraud direct-instruction attacks. Instead of fabricating a transaction from scratch, the attacker intercepts a real one already in progress.

The BEC payment diversion attack chain:

  1. Attacker compromises an email account via prior MFA reset or spear phishing
  2. Monitors email traffic for upcoming legitimate payments — vendor invoices, payroll, inter-company transfers
  3. At the moment payment is being processed, contacts finance via cloned voice call to redirect it, impersonating the executive or vendor
  4. Provides “updated” bank account details
  5. The legitimate payment routes to the attacker-controlled account
  6. Discovery occurs when the real vendor follows up on non-payment — funds have already moved multiple times

Voice cloning upgrades email-only BEC by adding vocal authentication at the critical moment. The attacker controls both channels — compromised email and cloned voice — creating a multi-channel attack that standard controls struggle to detect.

The Verizon DBIR found a median BEC loss of $50,000 per incident. Pretexting incidents have nearly doubled to over 50% of all social engineering incidents. BEC and social engineering fraud represent roughly half of all cyber insurance claims over the past five years.

What is pig butchering fraud and how does AI change its scale?

Pig butchering (sha zhu pan) is a long-duration investment fraud where an attacker builds a relationship with a target over days or weeks, convinces them to invest in a fraudulent platform, then disappears with the funds.

Traditional pig butchering is labour-intensive — one scammer, a handful of targets, weeks of personalised contact. AI changes that entirely. Synthetic identity kits — voice profile, AI-generated photographs, fabricated social media history — can be assembled for as little as $5 per persona. Bots sustain relationships around the clock without a human operator anywhere in the picture.

The FTC opened more than 65,000 romance scam cases last year with $3 billion in reported losses. AI-enabled fraud losses are projected to reach $40 billion by 2027 (Deloitte). Pig butchering has historically targeted individuals, but executives are increasingly targeted via investment pretexts from apparently known contacts — and deepfake technology means a video call is no longer proof of identity.

How do I know if a phone call from my executive team is real?

Honestly? You can’t reliably tell from the call itself. Current-generation voice cloning is indistinguishable in most conditions, and caller ID spoofing makes the number appear legitimate.

The answer is out-of-band verification — a communication channel the attacker cannot control. Hang up and call back on a number you look up independently in the corporate directory, not one provided during the call. For wire transfers and MFA resets, the verification protocols required go beyond a quick callback. Defensive controls for each of these attack vectors and how to build a verification protocol without a dedicated security team are covered separately.

No amount of training eliminates this risk entirely: 33% of trained employees still disclose information during simulated vishing tests. Procedural controls are the next layer down.

The landscape of AI-enabled social engineering is accelerating across every attack surface. Mapping each business function to its specific attack chain is the starting point for defences that hold.

Frequently Asked Questions

What is the difference between vishing and phishing?

Phishing uses email to deliver fraudulent messages; vishing uses voice calls; smishing uses SMS. Vishing is more dangerous per incident because real-time conversation lets the attacker adapt to responses, create immediate pressure, and exploit voice recognition trust. AI voice cloning makes vishing nearly undetectable.

How much audio does an attacker need to clone someone’s voice?

Current voice cloning tools require as little as 3–30 seconds of clean audio from earnings calls, conference presentations, YouTube videos, podcasts, or LinkedIn audio posts. No hacking required.

What is the most expensive AI voice fraud case on record?

The Arup case in Hong Kong (2024) — $25.6M–$39M in losses from a deepfake video conference where a finance employee interacted with synthetic recreations of the CFO and multiple colleagues. Every participant was AI-generated.

Can AI clone a voice in real time during a phone call?

Yes. Current-generation tools support real-time synthesis — the attacker speaks naturally, AI converts their voice to the target’s with minimal latency. NCC Group demonstrated this using a consumer-grade GPU and minutes of publicly available audio.

What is vishing-as-a-service?

Vishing-as-a-service (VaaS) is a commercialised attack model where groups like Scattered Spider sell vishing call execution for $500–$1,000 per call using pre-written scripts targeting IT help desks. It’s the ransomware-as-a-service model applied to voice fraud — it lowers the skill barrier and increases volume.

Why are SMBs at risk if the high-profile cases involve large enterprises?

SMBs (50–500 employees) often lack formal identity verification protocols for phone-based requests. The average cost of a single successful voice attack is $5,000–$25,000, with 20% of organisations experiencing $25,000–$100,000 per incident (Modulate survey). SMBs are targeted nearly four times more often than large enterprises per Verizon DBIR 2025.

What should I do immediately if I suspect an AI voice call scam?

Contact your IT security and finance teams immediately. Freeze any transfers in progress. Document the call details — time, displayed number, what was requested. Report to FBI IC3 (ic3.gov). Do not call back the number that contacted you.

What is pretexting and how does voice cloning make it more effective?

Pretexting is fabricating a scenario to manipulate a target — “I’m travelling and lost my phone.” It accounts for 27% of social engineering breaches per Verizon DBIR, and has nearly doubled to over 50% of all social engineering incidents. Voice cloning makes the fabricated scenario audibly indistinguishable from a legitimate request.

Does MITRE ATT&CK classify vishing as a technique?

Yes. Vishing is classified as T1566.004 (Phishing: Voice) within the Initial Access tactic. T1598.004 covers reconnaissance-phase vishing. Both are used in enterprise threat modelling frameworks.

How does caller ID spoofing work with voice cloning?

Caller ID spoofing uses VoIP infrastructure to display a trusted internal number or an executive’s registered mobile. Combined with voice cloning, both the number and the voice match what the target expects — eliminating both primary verification heuristics at the same time.

Are deepfake video calls more dangerous than voice-only vishing?

Deepfake video calls are more resource-intensive but achieve higher-value outcomes — the Arup case ($25.6M) and Singapore case ($499K) both used video. Voice-only vishing is far more scalable and accounts for the majority of attacks by volume. More than $200 million was lost to deepfake scams in Q1 2025 alone (SecurityWeek).

AUTHOR

James A. Wondrasek James A. Wondrasek

SHARE ARTICLE

Share
Copy Link

Related Articles

Need a reliable team to help achieve your software goals?

Drop us a line! We'd love to discuss your project.

Offices Dots
Offices

BUSINESS HOURS

Monday - Friday
9 AM - 9 PM (Sydney Time)
9 AM - 5 PM (Yogyakarta Time)

Monday - Friday
9 AM - 9 PM (Sydney Time)
9 AM - 5 PM (Yogyakarta Time)

Sydney

SYDNEY

55 Pyrmont Bridge Road
Pyrmont, NSW, 2009
Australia

55 Pyrmont Bridge Road, Pyrmont, NSW, 2009, Australia

+61 2-8123-0997

Yogyakarta

YOGYAKARTA

Unit A & B
Jl. Prof. Herman Yohanes No.1125, Terban, Gondokusuman, Yogyakarta,
Daerah Istimewa Yogyakarta 55223
Indonesia

Unit A & B Jl. Prof. Herman Yohanes No.1125, Yogyakarta, Daerah Istimewa Yogyakarta 55223, Indonesia

+62 274-4539660
Bandung

BANDUNG

JL. Banda No. 30
Bandung 40115
Indonesia

JL. Banda No. 30, Bandung 40115, Indonesia

+62 858-6514-9577

Subscribe to our newsletter