Data Integrity in Postgres
Last updated: February 27th 2025
Introduction
What is data integrity? In simple terms, it means your data is always accurate, reliable, and fit for purpose. Like a stream of clear water, unpolluted, always able to quench thirst. If data becomes corrupted, erroneous, or lost, everything crumbles. Decisions become flawed, processes halt, and trust is shattered.
PostgreSQL, a robust and mature database system, provides a multi-layered defense system to safeguard data integrity. Like a fortress, each wall and each guard is prepared to protect your data. Let us delve into the depths of this defense and see how PostgreSQL ensures data integrity.
Pillars of Postgres
The first and foremost pillar is the ACID properties. Atomicity, Consistency, Isolation, Durability. These four words are like the fundamental mantra of data integrity.
-
Atomicity: A transaction, a single unit of work. It must either be completed fully or not at all. No room for halfway measures. Like a switch, either on or off. If a transaction fails for any reason, PostgreSQL will roll everything back to the previous state. Data always remains in a consistent state.
For example, imagine transferring money from one bank account to another. This transaction must occur in two steps: first, money is deducted from your account, and second, that money is added to the recipient's account. If the second step fails (e.g., network issue), the first step must also be reversed, and no money will be deducted from your account. Atomicity ensures the transaction either fully succeeds or fully fails, with no inconsistencies in the data.
BEGIN; -- Start transaction UPDATE accounts SET balance = balance - 100 WHERE account_id = 123; -- Deduct money from account UPDATE accounts SET balance = balance + 100 WHERE account_id = 456; -- Add money to account COMMIT; -- Commit transaction
If an error occurs before COMMIT, the transaction can be rolled back using the ROLLBACK command, restoring data to its previous consistent state.
-
Consistency: The database will always remain in a valid state. Every transaction will move the database from one valid state to another. PostgreSQL ensures that database rules, such as constraints and triggers, are always followed, and invalid data can never enter the database.
For example, suppose you create a
CHECK
constraint for a table, ensuring that age must always be greater than or equal to 18 years. If someone tries to enter an age less than 18 into the database, PostgreSQL will prevent this data entry and display an error message. Consistency ensures that the database always adheres to defined rules, and invalid data cannot corrupt the database’s integrity.CREATE TABLE users ( user_id SERIAL PRIMARY KEY, name VARCHAR(255), age INTEGER CHECK (age >= 18) -- Age constraint ); INSERT INTO users (name, age) VALUES ('John Doe', 17); -- Error! Violates age constraint
-
Isolation: Even if multiple transactions are running concurrently, they will not interfere with each other’s work. Each transaction appears to run in isolation, viewing a consistent snapshot of the database. Isolation ensures concurrent transactions do not corrupt data integrity.
For example, imagine two users viewing the balance of the same account simultaneously. If while the first user is viewing the balance, the second user initiates a transaction to withdraw money from that account, Isolation ensures the first user always sees the consistent balance as it was at the start of their transaction, not the changed balance during the transaction. This prevents data conflicts and data corruption. PostgreSQL provides various isolation levels, such as Read Committed, Repeatable Read, and Serializable, which can be used as needed.
-
Durability: Once a transaction is committed, the data is permanently saved in the database. Even if the system crashes or power fails, the data will never be lost. PostgreSQL uses Write-Ahead Logging (WAL), which keeps a log of every change, ensuring data is always reliably stored.
For example, when you COMMIT a transaction, PostgreSQL first logs the transaction in the WAL file and then writes the changes to the data file. If the system crashes, PostgreSQL can recover the last successful transaction from the WAL file, and restore the database to a consistent state. Durability ensures your valuable data is never lost, even in unforeseen circumstances.
Beyond ACID properties, PostgreSQL provides many more powerful features to safeguard data integrity. Constraints are among the most significant.
Constraints are rules imposed on data in tables. They ensure that data always adheres to specific rules. PostgreSQL supports various types of constraints:
-
NOT NULL: Ensures a column cannot contain null values. For example, a user's name should never be null. This prevents data incompleteness.
CREATE TABLE products ( product_id SERIAL PRIMARY KEY, name VARCHAR(255) NOT NULL, -- Name column cannot accept null values price DECIMAL ); INSERT INTO products (name, price) VALUES (NULL, 10.99); -- Error! Name cannot be null
-
UNIQUE: Ensures values in a column are always unique. For example, a user’s email address should always be unique. This prevents data duplication.
CREATE TABLE users ( user_id SERIAL PRIMARY KEY, email VARCHAR(255) UNIQUE, -- Email column must have unique values name VARCHAR(255) ); INSERT INTO users (email, name) VALUES ('john.doe@example.com', 'John Doe'); INSERT INTO users (email, name) VALUES ('john.doe@example.com', 'Jane Doe'); -- Error! Email is not unique
-
PRIMARY KEY: Uniquely identifies each row in a table. It is a combination of NOT NULL and UNIQUE constraints. Each table should have a primary key. For example, the
product_id
column can be the primary key for theproducts
table.CREATE TABLE products ( product_id SERIAL PRIMARY KEY, -- product_id is the primary key name VARCHAR(255), price DECIMAL );
-
FOREIGN KEY: Establishes relationships between two tables. It ensures values in a column of one table must match values in a column of another table. For example, the
customer_id
column in theorders
table can form a foreign key relationship with thecustomer_id
column in thecustomers
table. This ensures referential integrity and prevents orphaned records.CREATE TABLE customers ( customer_id SERIAL PRIMARY KEY, name VARCHAR(255) ); CREATE TABLE orders ( order_id SERIAL PRIMARY KEY, customer_id INTEGER REFERENCES customers(customer_id), -- Foreign key constraint order_date DATE ); INSERT INTO orders (customer_id, order_date) VALUES (999, '2023-10-27'); -- Error! customer_id 999 does not exist in customers table
-
CHECK: Verifies if data in a column meets a specific condition. For example, age should always be greater than or equal to 18, or product price should always be greater than zero. This ensures domain integrity of data.
CREATE TABLE products ( product_id SERIAL PRIMARY KEY, name VARCHAR(255), price DECIMAL CHECK (price > 0) -- Price must be greater than zero ); INSERT INTO products (name, price) VALUES ('Laptop', -100); -- Error! Price cannot be negative
Wrapping up
-
Beyond constraints, data types play a crucial role in maintaining data integrity. PostgreSQL supports various data types, such as INTEGER, VARCHAR, DATE, TIMESTAMP, BOOLEAN, JSON, ARRAY, and many more. By using the correct data type, you ensure data is stored accurately and prevent data corruption caused by using the wrong data type. For example, using the TEXT data type in a column meant only for storing numbers can compromise data integrity. PostgreSQL’s robust type system simplifies data validation and data manipulation.
-
Transactions are a cornerstone of data integrity in PostgreSQL. We already discussed atomicity within ACID properties, a key part of transactions. Transactions group multiple SQL statements into a logical unit. Either all statements succeed or none do. This helps maintain the database in a consistent state, especially when multiple users are working with the same data. By using transactions, you ensure data updates happen atomically, reducing the risk of data corruption or data loss.
-
Write-Ahead Logging (WAL) ensures PostgreSQL's durability and recoverability. WAL logs every data change before writing data to disk. If a system crash occurs, PostgreSQL can recover the last successful transactions from the WAL file and restore the database to a consistent state. WAL ensures committed transactions are always permanently saved, minimizing the potential for data loss. WAL is an indispensable part of data integrity, particularly for critical applications where data loss is unacceptable.
This article was written by Ahmad Adel. Ahmad is a freelance writer and also a backend developer.
Related articles
-
Database Partitioning in PostgreSQL
Instructions on creating partitions in PostgreSQL
Last updated: February 5th 2025
-
A Step-by-Step Guide to Installing PostgreSQL on Ubuntu
A detailed guide on how to install PostgreSQL database server on Ubuntu
Last updated: March 6th 2024
-
Analyze SQL queries with the Query Planner for PostgreSQL
An article on analyzing SQL queries with the query planner
Last updated: February 5th 2025