All Categories
How credential harvesting, caller ID spoofing, and OTP bots work across email, voice, and SMS
Attack Channels
The SaaS Takeover
Saturday, 10:42 AM. Rita Gomez runs a small accounting firm in Los Angeles. Her phone buzzes with an email notification from LedgerLite, the cloud accounting software her team uses daily.
Subject: Payment Declined - Action Required
Your automatic payment failed. Update your billing information to keep your data.
The email looks exactly like every other LedgerLite notification she's received for the past two years. Same logo, same blue buttons, same footer with the unsubscribe link. Rita clicks through to update her card.
The login page asks her to sign in with Google, just like the real LedgerLite does. She clicks, enters her Google credentials, and lands on a billing update form. She enters her credit card number, expiry, and CVV. A loading spinner appears. "Billing updated successfully."
But Rita isn't on LedgerLite's website. She's on ledger-secure.com, a clone registered 48 hours ago. The fake site captured her Google credentials in real time, passed them to the real Google, received her authentication token, and now controls both her LedgerLite session and her entire Google account.
Within fifteen minutes, the attacker uses Rita's real LedgerLite account to invite three new "employees," generate API keys, and export 18 months of invoice data. Client names, email addresses, payment details, everything. They also have her Gmail: years of client correspondence, password reset emails, and every other service she's ever signed into with Google.
This story is fictional, but the patterns are real.
Why This Matters
In Social Engineering Fundamentals, we covered the psychology behind social engineering: authority, urgency, social proof, and the other principles attackers exploit. We looked at how multi-channel attacks coordinate email, voice, and SMS to reinforce credibility.
This article goes deeper into the technical mechanics. How does a phishing site capture credentials without triggering MFA? How does caller ID spoofing actually work? What makes an OTP bot effective against 3-D Secure?
Understanding these mechanics matters because attack techniques evolve faster than the psychology behind them. Cialdini's principles haven't changed since 1984, but the tools attackers use to exploit them have transformed completely. Knowing how credential harvesting, caller ID spoofing, and OTP interception work helps you understand why certain attacks succeed and recognize when familiar patterns appear in new forms.
Email Attacks
Email remains the most common channel for social engineering. It scales infinitely, costs almost nothing, and reaches targets wherever they are. But modern email attacks look nothing like the Nigerian prince scams of the 2000s.
Targeting terminology: Phishing cast wide nets. Spear phishing targets specific individuals using personalized details. Whale phishing (or whaling) targets executives and high-value individuals. The mechanics are identical; only the targeting changes.
The Credential Harvesting Kit
A phishing kit is a pre-built package for creating fake login pages. Underground markets sell kits for every major platform: Microsoft 365, Google Workspace, Okta, Salesforce, banking portals, and SaaS applications like Rita's LedgerLite.
A typical kit includes:
- HTML templates that mirror the target site's appearance
- JavaScript to capture keystrokes and form submissions
- Backend code to relay captured credentials to the attacker
- Evasion features like geoblocking and bot detection
The attacker registers a lookalike domain (ledger-secure.com instead of ledgerlite.com), uploads the kit, and sends emails directing victims to the fake site. When someone enters credentials, the kit captures them instantly.
Defeating MFA
Basic credential harvesting fails when the target uses multi-factor authentication. The attacker gets a username and password but can't complete the login without the second factor.
Attackers solve this two ways:
Manual relay. The fake login page prompts "enter your verification code" after capturing the password. The victim enters their SMS or authenticator code. The attacker, watching in real time, types it into the real site before the code expires. This is low-tech but effective. It requires the attacker to be actively monitoring and fast enough to use time-limited codes.
Reverse-proxy phishing. Tools like Evilginx sit between the victim and the real website, automating the relay:
- Victim clicks the phishing link and sees what appears to be a login page
- The phishing server forwards everything to the real site in real time
- Victim enters username and password; the server relays them instantly
- Real site triggers MFA; victim enters the code on the phishing page
- Server relays the code; real site creates an authenticated session
- Phishing server captures the session cookie and delivers it to the attacker
The attacker now has a valid session. They don't need the password again because the session cookie proves the authentication already happened. The cookie works until it expires, which might be hours or days depending on the application.
Both techniques defeat SMS codes, authenticator apps, and push notifications. The victim completes their normal authentication flow without realizing anything is wrong.
What actually stops this: Hardware security keys using FIDO2/WebAuthn bind authentication to the legitimate domain. When the victim tries to authenticate on ledger-secure.com, the security key refuses because it only recognizes ledgerlite.com. The attacker can't complete the proxy handshake. This is why high-value accounts increasingly require hardware keys rather than phone-based MFA.
OAuth Consent Attacks
OAuth lets applications access your data without knowing your password. When you click "Sign in with Google," you're using OAuth. The application redirects you to Google, you approve access, and Google gives the application a token to act on your behalf.
Attackers abuse this by creating malicious applications that request excessive permissions. The flow looks legitimate because you really are on Google's real login page. The attack happens in the consent step, where you're asked to approve permissions for an application you think is legitimate.
A malicious app might request:
- Read your email
- Manage your contacts
- Access your Google Drive files
- Send email on your behalf
Once granted, these permissions persist until explicitly revoked. The attacker doesn't need your password or MFA code. They have a token that lets them access your account directly through Google's API.
Quishing (QR Code Phishing)
QR codes can arrive in emails, but they're often physical: stickers on parking meters, posters in office lobbies, flyers on windshields, fake restaurant menus. Scanning leads to credential harvesting sites or malware downloads.
The physical element adds credibility. A QR code on a parking meter feels official. A poster in the break room advertising a company event looks like HR posted it. Scan to RSVP, enter your corporate credentials, and the attacker has your login.
Common pretexts include event registration, parking payments, Wi-Fi access, MFA setup instructions, and benefits enrollment.
Callback Phishing (TOAD)
Telephone-Oriented Attack Delivery reverses the typical flow. Instead of clicking a link, the email instructs victims to call a phone number to resolve an urgent issue: a suspicious charge, an expiring subscription, a security alert.
When victims call, they reach an attacker-controlled call center. The "support agent" walks them through installing remote access software, reading out MFA codes, or providing payment details. The victim initiated the call, so it feels safer than an inbound cold call.
Voice Attacks
Phone calls add pressure that email can't match. You're on the line. Someone's waiting. The conversation moves in real time, leaving no opportunity to step back and verify.
Caller ID Spoofing
Caller ID was designed for convenience, not security. The protocol trusts whatever information the caller provides. Spoofing requires no special equipment. VoIP services let anyone set any number as their outbound caller ID.
An attacker can display:
- Your company's main number
- A bank's fraud department line
- A government agency number
- Any number they want
The displayed number (CNAM) and the actual originating number (ANI) are completely independent. Technical solutions like STIR/SHAKEN authenticate caller identity on participating networks, but coverage is incomplete and verification happens at the carrier level, not the phone level. Most recipients never see whether a call passed authentication.
The Helpdesk Play
Helpdesk staff have the authority to reset passwords, disable MFA, and grant elevated access. That makes them high-value targets.
A typical attack unfolds like this:
Tuesday, 8:30 AM. Robert Kim answers a call at the IT service desk. Caller ID shows the company headquarters number. A calm, professional voice identifies herself as Emily Sanders, VP of Product.
"I'm locked out and standing in front of press demo guests. I need my Okta MFA reset immediately."
Robert follows his reset procedure. He asks for Emily's employee ID (available on LinkedIn). He asks her start date (mentioned in a press release). Satisfied, he resets her MFA factor and delivers an 8-digit temporary code over the phone.
The caller isn't Emily. Within minutes, "Emily" logs in, escalates to Super-Admin, and downloads product roadmap files. The real Emily calls two hours later about a disabled badge. By then, the attacker is gone.
The attack exploited three things: caller ID trust, time pressure ("press demo guests"), and a reset process that accepted verbal verification.
For more on how these attacks lead to account compromise, see Infrastructure and Social Engineering Attacks in the Account Takeover module.
Push Notification Fatigue
Push-based MFA asks users to approve logins on their phone. Tap "approve" and you're authenticated. This seems more secure than SMS codes, but it creates a new vulnerability: fatigue.
If an attacker has stolen credentials, they can trigger push notifications repeatedly. The victim's phone buzzes at 2 AM. Then again. And again. Some people, exhausted and confused, eventually tap "approve" just to make it stop.
More sophisticated attackers call the victim simultaneously:
"Hi, this is IT security. We're seeing someone trying to break into your account. You might be getting authentication requests on your phone. Please approve the next one so we can trace where the attack is coming from."
The victim approves what they think is a diagnostic request. The attacker logs in.
Family Emergency Scams
"Grandparent scams" and "Hi Mom" texts exploit family bonds. An attacker calls claiming to be a grandchild in jail needing bail money, or texts from an unknown number: "Hi mom, I broke my phone. Can you send money to this account?"
These attacks target consumers rather than businesses, but the mechanics are identical to corporate vishing: caller ID spoofing, emotional urgency, and requests for immediate payment via gift cards, wire transfers, or cryptocurrency.
SMS Attacks
Text messages have a unique psychological profile. People read almost every text they receive. The short format doesn't leave room for the red flags that longer emails might contain. And texts feel personal in a way email doesn't.
The Delivery Scam
Sunday, 3:12 PM. Lena Davis, a college student in Seattle, receives a text:
USPS: Parcel #9045881 on hold. Pay $1.95 redelivery fee → usps-parcel-verify.com
She's expecting a package. The small fee seems reasonable. She clicks through, enters her card details and phone number on a site that looks exactly like the USPS tracking portal. The form even asks for her billing ZIP code, just like a real payment page.
Lena has just given an attacker her card number, expiry, CVV, phone number, and address. But the attack isn't over.
OTP Bots
Armed with Lena's card details, the attacker heads to an electronics retailer and fills a cart with $2,400 worth of gear. At checkout, the retailer's payment processor triggers 3-D Secure, a verification step that sends a one-time code to the cardholder's phone.
Lena's phone buzzes with a text from her bank: "Your verification code is 834921."
Seconds later, her phone rings. An automated voice says: "This is your bank's fraud prevention department. We've detected suspicious activity on your account. To verify your identity and block unauthorized transactions, please enter the six-digit code you just received."
Lena, worried about fraud on her card, punches in 834921.
The OTP bot, running on a Telegram server, receives the code and submits it to the retailer's checkout page. Transaction approved. The electronics ship to a reshipper address.
This attack chain, often called "smishing" followed by OTP interception, defeats 3-D Secure entirely. The bank's code went to the right phone. The customer entered it. Everything looks legitimate.
A2P Sender Spoofing
Legitimate businesses send texts through Application-to-Person (A2P) gateways, which display company names instead of phone numbers. You see "USPS" or "Chase" as the sender, not a random number.
Attackers rent access to the same gateways or exploit weak verification requirements to send texts that appear to come from trusted brands. The recipient sees a message from "USPS" that actually originated from a criminal operation.
Some gateways verify sender identity poorly or not at all. Others are compromised by insiders selling access. Either way, the sender name provides false credibility.
Multi-Channel Coordination
The most effective attacks combine channels. Each reinforcement makes the story more believable.
Consider the attack on Carla Lopez from Social Engineering Fundamentals. Email arrived first, priming concern about a security incident. SMS followed seconds later, reinforcing urgency. Then a phone call provided the human authority to guide her through the credential harvest.
Why does this work?
Channel diversity creates false verification. When the same urgent story arrives through email, text, and voice, each channel seems to confirm the others. The victim thinks: "If this were fake, would I really be getting calls about it too?"
Speed prevents independent verification. Three contacts in four minutes doesn't leave time to call IT, check with a colleague, or think carefully. The attack creates a crisis and offers an immediate solution.
Different channels reach different people. Some people ignore emails. Others never answer unknown calls. Multi-channel attacks maximize the odds of getting through.
Each channel plays to its strengths. Email delivers the official-looking notice. SMS creates urgency with its notification buzz. Voice provides real-time pressure and guidance.
AI-Enhanced Attacks
The channel mechanics described above assume the attacker sounds like themselves and looks like themselves. That assumption is now outdated.
Voice Cloning
Modern voice cloning creates synthetic speech that sounds like a specific person. Current systems need as little as 15 seconds of sample audio. Sources include earnings calls, conference talks, YouTube videos, and podcast appearances. Any executive who speaks publicly provides training material.
The process works in real time. The attacker speaks into their microphone; AI converts their words into the target's voice with sub-second latency. Natural conversation is possible. The attacker's own speech patterns, hesitations, and emphasis flow through in the cloned voice.
This transforms voice attacks. The helpdesk play described earlier becomes more convincing when "Emily Sanders" actually sounds like Emily Sanders. The caller isn't just claiming to be the VP of Product. They have her voice.
Video Deepfakes
Video deepfakes replace faces in real time. Consumer-grade graphics cards can now run face-swapping at 30 frames per second, which is standard video call quality. Open-source software handles the technical complexity.
In early 2024, a finance employee at Arup, the British engineering firm that designed the Sydney Opera House, received an email from the company's CFO requesting confidential transactions. The employee was suspicious. It looked like a phishing attempt.
Then came an invitation to a video call.
On the call, the CFO appeared on camera alongside several colleagues the employee recognized. They discussed the transfers. The CFO explained the urgency. Reassured by seeing familiar faces, the employee made 15 transfers totaling $25 million to five Hong Kong bank accounts.
Every person on that call was a deepfake. Hong Kong police later determined the attackers had used publicly available video and audio of Arup executives from online conferences and company meetings to train the AI models. The employee only discovered the fraud when he checked with the UK head office days later.
Arup's Chief Information Officer described it as "technology-enhanced social engineering." No systems were breached. No credentials were stolen. The attack worked because a finance employee trusted what he saw on a video call.[1]
What This Changes
Traditional advice for high-stakes requests was "verify by video call." If you're unsure whether the voice on the phone is really your CEO, get them on video. If you can see their face, you know it's them.
That advice no longer holds. Both voice and video can be faked simultaneously in real time. The Arup employee saw multiple executives on camera. Visual confirmation overrode his initial suspicion.
Hardware security keys still work because they verify domains, not faces. But for human-to-human verification, the channels themselves can no longer be trusted.
Key Takeaways
- Reverse-proxy phishing defeats MFA in real time. These attacks relay credentials and MFA codes between the victim and the real site, capturing session cookies that work until they expire. Understanding this technique explains why "just use MFA" isn't a complete answer.
- Caller ID provides zero security. Spoofing a phone number requires no special access or equipment. Displayed caller information should never be trusted for authentication or high-stakes decisions.
- OTP bots turn victims into accomplices. By calling immediately after triggering a verification code, attackers get victims to read back the exact code designed to stop them. The bank sees a successful verification from the right phone.
- Multi-channel attacks manufacture credibility. When the same story arrives through email, SMS, and voice within minutes, each channel reinforces the others. This coordination is deliberate, not coincidental.
- Technical and psychological components work together. Lookalike domains, session hijacking, and caller ID spoofing are tools. Psychology determines whether victims engage with them.
- AI has made voice and video unreliable for verification. Voice cloning needs 15 seconds of audio. Video deepfakes run on consumer hardware. "Verify by video call" no longer provides security for high-stakes decisions.
What's next: The Pretexting article explores how attackers construct believable personas and scenarios across any channel.
Key Terms
- Phishing kit: Pre-built package for creating fake login pages that capture credentials.
- Reverse-proxy phishing: Attack that relays authentication in real time between victim and legitimate site, capturing session cookies even when MFA is enabled.
- Session cookie: Data stored by your browser that proves you already authenticated. Works until it expires.
- OAuth token: Credential that lets an application access your account through an API without knowing your password.
- Caller ID spoofing: Falsifying the phone number or name displayed on incoming calls.
- STIR/SHAKEN: US caller ID authentication framework. Validates caller identity at the carrier level but doesn't guarantee display to recipients.
- OTP bot: Automated system that calls or texts victims to steal one-time passwords for real-time use.
- 3-D Secure: Additional verification step for online card payments that sends a code to the cardholder.
- A2P (Application-to-Person): SMS gateway that lets businesses send texts displaying company names instead of phone numbers.
- Vishing: Voice phishing. Social engineering conducted over phone calls.
- Smishing: SMS phishing. Social engineering delivered through text messages.
- Voice cloning: Using AI to synthesize speech that sounds like a specific person, trained on samples of their real voice.
- Deepfake: AI-generated media (video or audio) that realistically depicts someone saying or doing something they never did.
- Spear phishing: Phishing targeting specific individuals using personalized details.
- Whale phishing: Phishing targeting executives and high-value individuals.
- Quishing: QR code phishing. Malicious QR codes in emails or physical locations (parking meters, posters, menus) leading to credential harvesting sites.
- TOAD (Telephone-Oriented Attack Delivery): Callback phishing where emails instruct victims to call attacker-controlled numbers.
For additional terms, see the Account Takeover Glossary.
References
1. CNN: Finance worker pays out $25 million after video call with deepfake 'chief financial officer'↗
Generated with AI assistance. Reviewed by humans for accuracy.
Test Your Knowledge
Ready to test what you've learned? Take the quiz to reinforce your understanding.