Analyze SQL queries with the Query Planner for PostgreSQL

Last updated: February 5th 2025

Introduction

“Victorious warriors win first in the execution plan, then seek battle.”
— Sun Tzu

In the battlefield of database administration, poor performance is the silent siege engine breaking down your walls. Just as the legendary general studied terrain and troop movements, we must master two strategic domains: the landscape of our data structures and the flow of query execution plans.

Now let’s learn how to analyse the query!

Optimizing Database Performance: Indexes and Execution Plans

Create a Test Environment

To identify performance bottlenecks, simulate real-world conditions. Small datasets mask issues—use 50 million rows for meaningful analysis:

CREATE TABLE users(id INT, t INT);

Generate 50 million test rows:

INSERT INTO users(id, t)
SELECT 
    random()*10000, 
    random()*10000 
FROM generate_series(1, 50000000);

Equivalent JavaScript implementation:

for (let i = 0; i < 50000000; i++) {
    insertIntoUsers(Math.random()*10000, Math.random()*10000); 
}

Key Objectives

This large dataset helps:
1. Identify missing indexes
2. Analyze query optimizer behavior under load

PostgreSQL’s INSERT 0 50000001 confirmation indicates successful data insertion.

Performance Comparison

Without Index (Full Table Scan):

EXPLAIN ANALYZE SELECT * FROM users WHERE id = 1234;

QUERY PLAN ----------------------------- [...] Planning Time: 0.047 ms Execution Time: 4003.314 ms

With Index (Indexed Lookup):

CREATE INDEX users_id_idx ON users(id);

QUERY PLAN ------------------------------------------- [...] Planning Time: 0.641 ms Execution Time: 6.718 ms

Results:

With index: 6.718 ms
Without index: 4003.314 ms

Improvement: 596x faster query execution.

Understanding Query Costs

For this query:

EXPLAIN SELECT id FROM users;

QUERY PLAN ---------------------------------------------------------------- Seq Scan on users (cost=0.00..865487.00 rows=50,000,000 width=4)

PostgreSQL’s cost estimate format:
1. Startup cost (0.00): Resources needed to return first row
2. Total cost (865487.00): Estimated total resources for full operation

Note: Costs are relative units, not milliseconds. Use EXPLAIN ANALYZE for actual timing.

Sorting Analysis

Test sorting on unindexed column t:

EXPLAIN SELECT t FROM users ORDER BY t;

QUERY PLAN ------------------------------------------------------------------------------------- Gather Merge (cost=4272014.12..10105754.74 rows=50000000 width=4) Workers Planned: 2 -> Sort (cost=4271014.09..4333514.09 rows=25000000 width=4) Sort Key: t -> Parallel Seq Scan on users (cost=0.00..515487.00 rows=25000000 width=4)

Execution Plan Breakdown:
1. Parallel Seq Scan: Full table scan using 2 worker processes, the databses used 2 workers to cut the query time by half.
2. Sort: In-memory sorting of 25 million rows per worker
3. Gather Merge: Combine sorted results from workers

Key Insight: Indexes convert O(n log n) sorting operations to O(log n) lookups. Unindexed sorts require full scans and memory-intensive operations.