Skip to main content
Learning Center
Fraud BasicsSQL Crash Course for Fraud Analysts

SQL Crash Course for Fraud Analysts

Essential SQL skills for investigating fraud cases - learn to query transaction data, analyze patterns, and gather evidence

SQL Crash Course for Fraud Analysts

Learn to ask questions of databases - explained like you're in high school


What Is SQL? (In Plain English)

SQL stands for "Structured Query Language" - but don't let that scare you! Think of it as a way to ask questions about data stored in a computer.

Imagine you have a giant filing cabinet with millions of customer records, transaction receipts, and account information. SQL is like having a super-smart assistant who can instantly find exactly what you're looking for.

We'll focus on SQLite in this course, which is what our fraud investigation exercises use. The good news? Once you learn SQLite, other databases like PostgreSQL and MySQL work almost exactly the same way!


Database Basics: Think of It Like Excel (But Way Bigger)

What's a Database?

A database is like a collection of Excel spreadsheets, but designed to handle millions of rows without crashing your computer.

What's a Table?

A table is just like one Excel spreadsheet. It has:

  • Rows = individual records (like one customer or one transaction)
  • Columns = different pieces of information (like name, amount, date)

Example: A "transactions" table might look like this:

transaction_id | customer_id | amount | merchant  | date      
1001          | 12345       | 50.00  | Amazon    | 2024-01-15
1002          | 12345       | 500.00 | Best Buy  | 2024-01-15  
1003          | 67890       | 25.00  | Starbucks | 2024-01-15

Think of each row as a receipt - it tells you who spent money, how much, where, and when.


Your First SQL Query: Asking Simple Questions

The SELECT Statement (Like Asking "Show Me...")

In regular English, you might ask: "Show me all the information about transactions"

In SQL, you write:

SELECT * FROM transactions;

Let's break this down step by step:

Step 1: SELECT - What information do we want to see?

  • * = "everything" (all columns)
  • FROM transactions = "from the transactions table"
  • ; = "I'm done with this question" (like a period)

More specific example: "Show me just the amounts and merchants from transactions"

SELECT amount, merchant FROM transactions;

The WHERE Clause (Like Adding "But Only...")

In regular English: "Show me transactions, but only the ones over $100"

In SQL:

SELECT * FROM transactions WHERE amount > 100;

Another example: "Show me transactions, but only from Amazon"

SELECT * FROM transactions WHERE merchant = 'Amazon';

Key point: Notice the single quotes around 'Amazon' - that's how you tell SQL it's text, not a number.


Comparison Operators (Ways to Compare Things)

Think of these like math class, but for data:

Equals and Not Equals:

  • = means "equals exactly" → amount = 100 (finds transactions of exactly $100)
  • != means "not equal to" → merchant != 'Amazon' (finds everything except Amazon)

Greater Than and Less Than:

  • > means "greater than" → amount > 100 (finds transactions over $100)
  • < means "less than" → amount < 100 (finds transactions under $100)
  • >= means "greater than or equal" → amount >= 100 (finds $100 and above)
  • <= means "less than or equal" → amount <= 100 (finds $100 and below)

Real examples:

-- Find expensive transactions SELECT * FROM transactions WHERE amount > 1000; -- Find cheap transactions SELECT * FROM transactions WHERE amount < 10; -- Find everything except Amazon SELECT * FROM transactions WHERE merchant != 'Amazon';

Combining Conditions (AND & OR)

AND (Both Things Must Be True)

In regular English: "Show me transactions over $500 AND from Best Buy"

In SQL:

SELECT * FROM transactions WHERE amount > 500 AND merchant = 'Best Buy';

Think of AND like this: Both conditions must be true, like needing both a driver's license AND insurance to drive.

OR (Either Thing Can Be True)

In regular English: "Show me transactions from Amazon OR Best Buy"

In SQL:

SELECT * FROM transactions WHERE merchant = 'Amazon' OR merchant = 'Best Buy';

Think of OR like this: Either condition works, like accepting either cash OR credit card.

Using Parentheses for Complex Logic

In regular English: "Show me transactions from customer 12345 where they spent over $1000 OR bought something from a casino"

In SQL:

SELECT * FROM transactions WHERE customer_id = 12345 AND (amount > 1000 OR merchant LIKE '%Casino%');

Why parentheses matter:

-- Wrong (without parentheses) - this finds ALL casino transactions from anyone! SELECT * FROM transactions WHERE customer_id = 12345 AND amount > 1000 OR merchant LIKE '%Casino%'; -- Right (with parentheses) - this finds only customer 12345's big purchases OR casino visits SELECT * FROM transactions WHERE customer_id = 12345 AND (amount > 1000 OR merchant LIKE '%Casino%');

Think of parentheses like math - they control what gets grouped together first.


Pattern Matching with LIKE (Finding Similar Things)

What Is LIKE?

LIKE is for finding things that are similar but not exactly the same.

Wildcards (Special Characters)

  • % = "any letters/numbers here" (like a joker in cards)
  • _ = "exactly one letter/number here"

Examples:

-- Find anything with "Shell" in the name SELECT * FROM transactions WHERE merchant LIKE '%Shell%'; -- This finds: "Shell Gas", "Shell Station", "Bob's Shell Stop" -- Find websites (things ending in .com) SELECT * FROM transactions WHERE merchant LIKE '%.com%'; -- This finds: "Amazon.com", "PayPal.com", etc. -- Find anything starting with "ATM" SELECT * FROM transactions WHERE merchant LIKE 'ATM%'; -- This finds: "ATM Withdrawal", "ATM Cash", etc.

Working with Dates in SQLite

SQLite Date Functions

SQLite has special functions for working with dates:

Get today's date:

SELECT date('now');

Find transactions from today:

SELECT * FROM transactions WHERE date(transaction_date) = date('now');

Find transactions from the last 7 days:

SELECT * FROM transactions WHERE transaction_date >= date('now', '-7 days');

Find transactions from a specific date:

SELECT * FROM transactions WHERE date(transaction_date) = '2024-01-15';

Time-Based Fraud Patterns

Weekend transactions (often suspicious):

SELECT * FROM transactions WHERE strftime('%w', transaction_date) IN ('0', '6'); -- 0 = Sunday, 6 = Saturday

Late-night transactions:

SELECT * FROM transactions WHERE strftime('%H', transaction_date) >= '22' OR strftime('%H', transaction_date) <= '06'; -- After 10 PM or before 6 AM

Counting and Adding Things Up

COUNT (How Many?)

In regular English: "How many transactions did customer 12345 make?"

In SQL:

SELECT COUNT(*) FROM transactions WHERE customer_id = 12345;

SUM (Add Them Up)

In regular English: "What's the total amount customer 12345 spent?"

In SQL:

SELECT SUM(amount) FROM transactions WHERE customer_id = 12345;

AVG (Average)

In regular English: "What's the average transaction amount for customer 12345?"

In SQL:

SELECT AVG(amount) FROM transactions WHERE customer_id = 12345;

MIN and MAX (Smallest and Largest)

-- Smallest transaction SELECT MIN(amount) FROM transactions WHERE customer_id = 12345; -- Largest transaction SELECT MAX(amount) FROM transactions WHERE customer_id = 12345;

GROUP BY (Organizing Your Results)

What Does GROUP BY Do?

GROUP BY is like sorting your clothes into piles - all the shirts together, all the pants together, etc.

Example: "How many transactions did each customer make?"

SELECT customer_id, COUNT(*) FROM transactions GROUP BY customer_id;

This creates one row per customer showing their ID and how many transactions they made.

Another example: "How much did each customer spend in total?"

SELECT customer_id, SUM(amount) as total_spent FROM transactions GROUP BY customer_id;

HAVING (Filtering Groups)

HAVING is like WHERE, but for groups.

Example: "Show me customers who made more than 10 transactions"

SELECT customer_id, COUNT(*) as transaction_count FROM transactions GROUP BY customer_id HAVING COUNT(*) > 10;

Think of it this way:

  • WHERE filters individual rows (before grouping)
  • HAVING filters groups (after grouping)

JOINs (Combining Information from Different Tables)

Why Do We Need JOINs?

Think of it like having information in different notebooks:

  • Notebook 1: Customer names and addresses
  • Notebook 2: Their purchases and spending

To answer "Who bought what?" you need info from both notebooks. That's what JOIN does!

JOINs Explained with Friends and Phone Numbers

You're planning a party and have two pieces of paper:

Paper 1: Friends and Their Addresses (Table A)

Name    | Address
--------|------------------
Alice   | 123 Oak Street
Bob     | 456 Pine Avenue  
Charlie | 789 Elm Drive

Paper 2: Phone Numbers I Have (Table B)

Name    | Phone Number
--------|-------------
Alice   | 555-1111
Charlie | 555-3333
David   | 555-4444

The problem: You want to call your friends to give them directions to the party, but you need BOTH their address (so you know who to invite) AND their phone number (so you can call them).

INNER JOIN = "Only friends I can actually call"

Question: Which friends can you call to invite?

INNER JOIN Result:

Name    | Address          | Phone Number
--------|------------------|-------------
Alice   | 123 Oak Street   | 555-1111  ✓ 
Charlie | 789 Elm Drive    | 555-3333  ✓ 

Missing:

  • Bob (in Table A but not in Table B - no phone number)
  • David (in Table B but not in Table A - not a friend)

LEFT JOIN = "All my friends, whether I can call them or not"

Question: Show me all my friends and their numbers (if I have them)

LEFT JOIN Result:

Name    | Address          | Phone Number
--------|------------------|-------------
Alice   | 123 Oak Street   | 555-1111  ✓ 
Bob     | 456 Pine Avenue  | NULL       ✓ 
Charlie | 789 Elm Drive    | 555-3333  ✓ 

The difference: LEFT JOIN includes everyone from your friends table (Table A), even Bob who has no phone number.

The Super Simple Rule

  • INNER JOIN = Only matches that exist in BOTH lists
  • LEFT JOIN = Everything from the LEFT list + matches from the right list

Now with Real Data

CUSTOMERS Table:

customer_id | Name
------------|------
12345       | John
67890       | Mary
11111       | Bob

TRANSACTIONS Table:

transaction_id | customer_id | Amount
---------------|-------------|--------
1001          | 12345       | $50
1002          | 67890       | $100
1003          | 99999       | $25

INNER JOIN = customers who actually bought something:

INNER JOIN Result:

Name | Amount
-----|-------
John | $50
Mary | $100

LEFT JOIN = all customers, with purchases if they made any:

LEFT JOIN Result:

Name | Amount
-----|-------
John | $50
Mary | $100  
Bob  | NULL

The SQL Code

INNER JOIN (only customers who bought something):

SELECT name, amount FROM customers INNER JOIN transactions ON customers.customer_id = transactions.customer_id;

LEFT JOIN (all customers, with purchases if they made any):

SELECT name, amount FROM customers LEFT JOIN transactions ON customers.customer_id = transactions.customer_id;

That's it! JOINs are just matching lists together.

Visual Summary

INNER JOIN = Only the matches
Friends:     [Alice, Bob, Charlie]
Phone #s:    [Alice, Charlie, David]
Result:      [Alice, Charlie]  ← Only people in BOTH lists

LEFT JOIN = All from left list + matches
Friends:     [Alice, Bob, Charlie]  ← Keep ALL of these
Phone #s:    [Alice, Charlie, David]
Result:      [Alice+phone, Bob+no phone, Charlie+phone]

Key insight: INNER JOIN is stricter (must be in both), LEFT JOIN is more inclusive (keep everything from the left).


Practical Fraud Investigation Examples


Example 1: Customer Spending Patterns

Goal: Show me all customers who spent more than $5,000 today with their contact info

SELECT c.first_name, c.last_name, c.email, SUM(t.amount) as total_spent FROM customers c INNER JOIN transactions t ON c.customer_id = t.customer_id WHERE date(t.transaction_date) = date('now') GROUP BY c.customer_id, c.first_name, c.last_name, c.email HAVING SUM(t.amount) > 5000;

Step-by-Step Breakdown

Think of this like giving instructions to a detective:

Step 1: Find the right filing cabinets

  • FROM customers c - Go to the customers filing cabinet
  • INNER JOIN transactions t ON c.customer_id = t.customer_id - Connect it to the transactions cabinet using customer ID as the link

Step 2: Filter to today's cases only

  • WHERE date(t.transaction_date) = date('now') - Only look at transactions that happened today

Step 3: Group related evidence together

  • GROUP BY c.customer_id, c.first_name, c.last_name, c.email - Put all of each customer's transactions in separate piles

Step 4: Calculate totals for each pile

  • SUM(t.amount) as total_spent - Add up all the transaction amounts in each customer's pile

Step 5: Only show the big spenders

  • HAVING SUM(t.amount) > 5000 - Only keep the piles where the total is over $5,000

Step 6: Decide what information to show

  • SELECT c.first_name, c.last_name, c.email, SUM(t.amount) as total_spent - Show the customer's name, email, and their total spending

The story: Go find all customers and their transactions, but only look at today's transactions. Group each customer's transactions together, add up how much each customer spent, then only show me customers who spent more than $5,000 today, along with their contact info.

Why This Matters for Fraud

High spending in one day could indicate a stolen card being used quickly.


Example 2: Velocity Fraud Detection

Goal: Find customers who made more than 5 transactions in the last hour

SELECT customer_id, COUNT(*) as transaction_count, MIN(transaction_date) as first_transaction, MAX(transaction_date) as last_transaction FROM transactions WHERE transaction_date >= datetime('now', '-1 hour') GROUP BY customer_id HAVING COUNT(*) > 5;

Step-by-Step Breakdown

Think of this like tracking suspicious activity:

Step 1: Find the transaction records

  • FROM transactions - Go to the transactions table

Step 2: Filter to recent activity

  • WHERE transaction_date >= datetime('now', '-1 hour') - Only look at transactions from the last hour
  • datetime('now', '-1 hour') means "1 hour ago from right now"

Step 3: Group by customer

  • GROUP BY customer_id - Put all transactions for each customer in separate piles

Step 4: Count transactions and find time range

  • COUNT(*) as transaction_count - Count how many transactions in each pile
  • MIN(transaction_date) - Find the earliest transaction time
  • MAX(transaction_date) - Find the latest transaction time

Step 5: Only show rapid-fire customers

  • HAVING COUNT(*) > 5 - Only keep customers who made more than 5 transactions

Step 6: Show the results

  • SELECT customer_id, COUNT(*), MIN(transaction_date), MAX(transaction_date) - Display customer ID, count, and time range

The story: Look at the last hour of transactions, group them by customer, count how many each customer made, and show me only customers who made more than 5 transactions.

Why This Matters for Fraud

Multiple rapid transactions could indicate card testing or account takeover.


Example 3: Geographic Impossibility

Goal: Find transactions that happened at different merchants within 2 hours (potentially impossible travel)

SELECT t1.customer_id, t1.merchant as first_location, t1.transaction_date as first_time, t2.merchant as second_location, t2.transaction_date as second_time FROM transactions t1 INNER JOIN transactions t2 ON t1.customer_id = t2.customer_id WHERE t1.merchant != t2.merchant AND t2.transaction_date > t1.transaction_date AND (julianday(t2.transaction_date) - julianday(t1.transaction_date)) * 24 < 2;

Step-by-Step Breakdown

Think of this like comparing locations and times:

Step 1: Set up the comparison

  • FROM transactions t1 - Start with the transactions table (call it "t1" for "first transaction")
  • INNER JOIN transactions t2 ON t1.customer_id = t2.customer_id - Join it to itself (call it "t2" for "second transaction")
  • This lets us compare each customer's transactions to their other transactions

Step 2: Filter for suspicious patterns

  • WHERE t1.merchant != t2.merchant - The two transactions must be at different places
  • AND t2.transaction_date > t1.transaction_date - The second transaction happened after the first
  • AND (julianday(t2.transaction_date) - julianday(t1.transaction_date)) * 24 < 2 - They happened less than 2 hours apart

Step 3: Show the evidence

  • SELECT t1.customer_id, t1.merchant, t1.transaction_date, t2.merchant, t2.transaction_date - Show customer ID, first location and time, second location and time

The story: Compare every transaction to every other transaction for the same customer. Show me cases where someone was at two different places within 2 hours.

The Date Math Explained

  • julianday() converts dates to numbers (like "day 12345 since year 0")
  • Subtract them to get the difference in days
  • Multiply by 24 to convert days to hours
  • < 2 means less than 2 hours

Real-world example: If someone bought gas in Miami at 2:00 PM and coffee in Seattle at 3:30 PM, that's physically impossible!

Why This Matters for Fraud

If someone bought coffee in New York at 2 PM and gas in California at 3 PM, that's physically impossible!

Note: This query is quite advanced - don't worry if it seems complicated. The important thing is understanding the concept.


Finding Round Dollar Amounts (Often Fraud)

SELECT * FROM transactions WHERE amount IN (100, 200, 500, 1000, 2000, 5000);

Why this works:

  • Fraudsters often test cards with round numbers like $100, $500
  • Real purchases are usually messy amounts like $47.83, $123.45
  • IN (100, 200, 500...) means "show me transactions with ANY of these exact amounts"

What to look for: Multiple customers with the same round amounts on the same day.

Finding Card Testing Patterns

-- Small test transactions followed by large ones SELECT customer_id, COUNT(*) as small_transactions, MAX(amount) as largest_amount FROM transactions WHERE date(transaction_date) = date('now') GROUP BY customer_id HAVING COUNT(CASE WHEN amount < 10 THEN 1 END) > 0 AND MAX(amount) > 500;

What this does: Finds customers who made both small AND large purchases today

Step 1: COUNT(CASE WHEN amount < 10 THEN 1 END) > 0 - They made at least one purchase under $10

  • CASE WHEN amount < 10 THEN 1 END counts only transactions under $10
  • > 0 means "at least one small transaction"

Step 2: MAX(amount) > 500 - Their biggest purchase was over $500

Why this is suspicious: Fraudsters often test stolen cards with small amounts first, then make big purchases if the card works.

Real-world example: Someone makes three $1 purchases, then immediately buys a $2,000 laptop.

Finding Money Transfer Services

SELECT * FROM transactions WHERE merchant LIKE '%Western Union%' OR merchant LIKE '%MoneyGram%' OR merchant LIKE '%Money Transfer%';

What this does:

  • LIKE '%Western Union%' finds any merchant name containing "Western Union"
  • The % symbols mean "any text before or after"
  • OR means it will find transactions at ANY of these types of places

Why money transfers are risky:

  • Hard to trace once money is sent
  • Common way for fraudsters to move stolen money
  • Often used in romance scams and business email compromise

Real-world example: "Western Union #1234", "Local MoneyGram Station", "Quick Money Transfer LLC"


Common Beginner Mistakes (And How to Avoid Them)

1. Forgetting Quotes Around Text

Wrong:

SELECT * FROM transactions WHERE merchant = Amazon;

Right:

SELECT * FROM transactions WHERE merchant = 'Amazon';

Remember: Numbers don't need quotes, text does!

2. Mixing Up WHERE and HAVING

Use WHERE for filtering individual rows:

SELECT * FROM transactions WHERE amount > 100;

Use HAVING for filtering groups:

SELECT customer_id, COUNT(*) FROM transactions GROUP BY customer_id HAVING COUNT(*) > 5;

3. Forgetting the Semicolon

Every SQL statement ends with a semicolon (;)

SELECT * FROM transactions;

4. Not Understanding JOIN Requirements

Remember: You need an ON clause to tell SQL how to match tables:

-- Wrong (missing ON clause) SELECT * FROM customers INNER JOIN transactions; -- Right SELECT * FROM customers INNER JOIN transactions ON customers.customer_id = transactions.customer_id;

Practice Exercises (Start Simple!)

Exercise 1: Basic Filtering

Find all transactions over $1,000.

<details> <summary>Click to see solution</summary>
SELECT * FROM transactions WHERE amount > 1000;
</details>

Exercise 2: Pattern Matching

Find all transactions at gas stations (merchants containing "Shell", "Exxon", or "BP").

<details> <summary>Click to see solution</summary>
SELECT * FROM transactions WHERE merchant LIKE '%Shell%' OR merchant LIKE '%Exxon%' OR merchant LIKE '%BP%';
</details>

Exercise 3: Counting

How many transactions happened today?

<details> <summary>Click to see solution</summary>
SELECT COUNT(*) FROM transactions WHERE date(transaction_date) = date('now');
</details>

Exercise 4: Grouping

Show how much each customer spent in total, but only customers who spent more than $1,000.

<details> <summary>Click to see solution</summary>
SELECT customer_id, SUM(amount) as total_spent FROM transactions GROUP BY customer_id HAVING SUM(amount) > 1000;
</details>

Building Your Confidence

Start Small

  1. Practice SELECT and WHERE first - get comfortable with basic filtering
  2. Add one new concept at a time - don't try to learn JOINs and GROUP BY simultaneously
  3. Use real examples - practice with actual fraud scenarios from your work
  4. Make mistakes - they're how you learn! SQL won't break if you write bad queries

Common Workflow for Fraud Investigation

Step-by-step process:

  1. Start with basic SELECT to see what data you have

Test Your Knowledge

Ready to test what you've learned? Take the quiz to reinforce your understanding.