Beginner's Guide: SQL Index: Why and How?

Last updated: February 3rd 2025

Introduction

Imagine a stack of hay, where each piece of hay has a uniquely colored string attached to it. If you want to retrieve a specific piece of hay, all you need to do is pull the string connected to it.

The alternative would be going over the entire stack of hay, one by one, checking every piece—definitely not an efficient approach.

In this analogy: - The haystack represents a heap (unsorted data). - The string represents an index.

Creating a Users Table

Here’s an example of creating a simple users table:

CREATE TABLE users (
    id INT,
    username VARCHAR(50),
    age INT
);

Now let’s query for a user with the username superuser:

SELECT 
    username, id, age 
FROM users 
WHERE username = 'superuser';

⚠️ Why This Is Bad

Without an index, this query performs a full table scan (O(N) time complexity).

That’s quite possibly the worst thing you could do to your database. If you repeatedly run O(N) queries for every read operation, you might as well write your data on parchment and send it to the user via homing pigeon.

So… What Should You Do?

Tie every hay piece to a string—in other words, create a non-clustered index:

CREATE INDEX idx_users_table_by_username ON users(username);

Alternatively, you can define a PRIMARY KEY while creating the table:

CREATE TABLE users (
    id INT,
    username VARCHAR(50) PRIMARY KEY,
    age INT
);

Why Is Indexing Powerful?

Imagine you have a table with 1 billion records. Here’s how an index changes the game:

Scenario	Complexity	Operations
Without an Index	O(N)	Up to 1 billion operations
With an Index	O(log N)	Approximately 30 operations

Be Careful with Indexes

1. Size

Indexes can consume up to 20% of the table’s size, depending on the database engine and type of index. While the performance benefits usually outweigh the storage cost, be thoughtful about which columns to index.

2. Slower Writes

Indexes make write operations slower (e.g., INSERT, UPDATE, DELETE) because the database has to update the index every time data changes. It’s like adding more strings to your haystack—eventually, it becomes harder to manage.

3. Not Every Column Deserves an Index

Indexing low-selectivity columns (e.g., gender with only two values) is like alphabetizing two books on a shelf—pointless.
Focus on columns with high selectivity, like unique usernames or email addresses.