Data Integrity in Postgres

Last updated: February 27th 2025

Introduction

What is data integrity? In simple terms, it means your data is always accurate, reliable, and fit for purpose. Like a stream of clear water, unpolluted, always able to quench thirst. If data becomes corrupted, erroneous, or lost, everything crumbles. Decisions become flawed, processes halt, and trust is shattered.

PostgreSQL, a robust and mature database system, provides a multi-layered defense system to safeguard data integrity. Like a fortress, each wall and each guard is prepared to protect your data. Let us delve into the depths of this defense and see how PostgreSQL ensures data integrity.

Pillars of Postgres

The first and foremost pillar is the ACID properties. Atomicity, Consistency, Isolation, Durability. These four words are like the fundamental mantra of data integrity.

Atomicity: A transaction, a single unit of work. It must either be completed fully or not at all. No room for halfway measures. Like a switch, either on or off. If a transaction fails for any reason, PostgreSQL will roll everything back to the previous state. Data always remains in a consistent state.

For example, imagine transferring money from one bank account to another. This transaction must occur in two steps: first, money is deducted from your account, and second, that money is added to the recipient's account. If the second step fails (e.g., network issue), the first step must also be reversed, and no money will be deducted from your account. Atomicity ensures the transaction either fully succeeds or fully fails, with no inconsistencies in the data.
```
BEGIN; -- Start transaction

UPDATE accounts SET balance = balance - 100 WHERE account_id = 123; -- Deduct money from account
UPDATE accounts SET balance = balance + 100 WHERE account_id = 456; -- Add money to account

COMMIT; -- Commit transaction
```
If an error occurs before COMMIT, the transaction can be rolled back using the ROLLBACK command, restoring data to its previous consistent state.
Consistency: The database will always remain in a valid state. Every transaction will move the database from one valid state to another. PostgreSQL ensures that database rules, such as constraints and triggers, are always followed, and invalid data can never enter the database.

For example, suppose you create a CHECK constraint for a table, ensuring that age must always be greater than or equal to 18 years. If someone tries to enter an age less than 18 into the database, PostgreSQL will prevent this data entry and display an error message. Consistency ensures that the database always adheres to defined rules, and invalid data cannot corrupt the database’s integrity.
```
CREATE TABLE users (
    user_id SERIAL PRIMARY KEY,
    name VARCHAR(255),
    age INTEGER CHECK (age >= 18) -- Age constraint
);

INSERT INTO users (name, age) VALUES ('John Doe', 17); -- Error! Violates age constraint
```
Isolation: Even if multiple transactions are running concurrently, they will not interfere with each other’s work. Each transaction appears to run in isolation, viewing a consistent snapshot of the database. Isolation ensures concurrent transactions do not corrupt data integrity.

For example, imagine two users viewing the balance of the same account simultaneously. If while the first user is viewing the balance, the second user initiates a transaction to withdraw money from that account, Isolation ensures the first user always sees the consistent balance as it was at the start of their transaction, not the changed balance during the transaction. This prevents data conflicts and data corruption. PostgreSQL provides various isolation levels, such as Read Committed, Repeatable Read, and Serializable, which can be used as needed.
Durability: Once a transaction is committed, the data is permanently saved in the database. Even if the system crashes or power fails, the data will never be lost. PostgreSQL uses Write-Ahead Logging (WAL), which keeps a log of every change, ensuring data is always reliably stored.

For example, when you COMMIT a transaction, PostgreSQL first logs the transaction in the WAL file and then writes the changes to the data file. If the system crashes, PostgreSQL can recover the last successful transaction from the WAL file, and restore the database to a consistent state. Durability ensures your valuable data is never lost, even in unforeseen circumstances.

Beyond ACID properties, PostgreSQL provides many more powerful features to safeguard data integrity. Constraints are among the most significant.

Constraints are rules imposed on data in tables. They ensure that data always adheres to specific rules. PostgreSQL supports various types of constraints:

NOT NULL: Ensures a column cannot contain null values. For example, a user's name should never be null. This prevents data incompleteness.

CREATE TABLE products (
    product_id SERIAL PRIMARY KEY,
    name VARCHAR(255) NOT NULL, -- Name column cannot accept null values
    price DECIMAL
);

INSERT INTO products (name, price) VALUES (NULL, 10.99); -- Error! Name cannot be null

UNIQUE: Ensures values in a column are always unique. For example, a user’s email address should always be unique. This prevents data duplication.

CREATE TABLE users (
    user_id SERIAL PRIMARY KEY,
    email VARCHAR(255) UNIQUE, -- Email column must have unique values
    name VARCHAR(255)
);

INSERT INTO users (email, name) VALUES ('john.doe@example.com', 'John Doe');
INSERT INTO users (email, name) VALUES ('john.doe@example.com', 'Jane Doe'); -- Error! Email is not unique

PRIMARY KEY: Uniquely identifies each row in a table. It is a combination of NOT NULL and UNIQUE constraints. Each table should have a primary key. For example, the product_id column can be the primary key for the products table.
```
CREATE TABLE products (
    product_id SERIAL PRIMARY KEY, -- product_id is the primary key
    name VARCHAR(255),
    price DECIMAL
);
```

FOREIGN KEY: Establishes relationships between two tables. It ensures values in a column of one table must match values in a column of another table. For example, the customer_id column in the orders table can form a foreign key relationship with the customer_id column in the customers table. This ensures referential integrity and prevents orphaned records.

CREATE TABLE customers (
    customer_id SERIAL PRIMARY KEY,
    name VARCHAR(255)
);

CREATE TABLE orders (
    order_id SERIAL PRIMARY KEY,
    customer_id INTEGER REFERENCES customers(customer_id), -- Foreign key constraint
    order_date DATE
);

INSERT INTO orders (customer_id, order_date) VALUES (999, '2023-10-27'); -- Error! customer_id 999 does not exist in customers table

CHECK: Verifies if data in a column meets a specific condition. For example, age should always be greater than or equal to 18, or product price should always be greater than zero. This ensures domain integrity of data.

CREATE TABLE products (
    product_id SERIAL PRIMARY KEY,
    name VARCHAR(255),
    price DECIMAL CHECK (price > 0) -- Price must be greater than zero
);

INSERT INTO products (name, price) VALUES ('Laptop', -100); -- Error! Price cannot be negative

Wrapping up

Beyond constraints, data types play a crucial role in maintaining data integrity. PostgreSQL supports various data types, such as INTEGER, VARCHAR, DATE, TIMESTAMP, BOOLEAN, JSON, ARRAY, and many more. By using the correct data type, you ensure data is stored accurately and prevent data corruption caused by using the wrong data type. For example, using the TEXT data type in a column meant only for storing numbers can compromise data integrity. PostgreSQL’s robust type system simplifies data validation and data manipulation.
Transactions are a cornerstone of data integrity in PostgreSQL. We already discussed atomicity within ACID properties, a key part of transactions. Transactions group multiple SQL statements into a logical unit. Either all statements succeed or none do. This helps maintain the database in a consistent state, especially when multiple users are working with the same data. By using transactions, you ensure data updates happen atomically, reducing the risk of data corruption or data loss.
Write-Ahead Logging (WAL) ensures PostgreSQL's durability and recoverability. WAL logs every data change before writing data to disk. If a system crash occurs, PostgreSQL can recover the last successful transactions from the WAL file and restore the database to a consistent state. WAL ensures committed transactions are always permanently saved, minimizing the potential for data loss. WAL is an indispensable part of data integrity, particularly for critical applications where data loss is unacceptable.

This article was written by Ahmad Adel. Ahmad is a freelance writer and also a backend developer.