Build Your Own In-memory Database From Scratch
Last updated: February 10th 2025
Note: Read the comments in the code, as they are very important.
I advise you to read about these topics to get more understanding, hashMaps
, hashing
, hashmap Collision
.
Alright, buckle up buttercup, because today we're diving headfirst into the exhilarating world of in-memory databases! Forget just using Redis – we're going to build our own, from the ground up.
Now, why in the world would you want to build your own Redis? Isn't the real deal already out there, battle-tested and ready to use?
Absolutely. Redis is a titan in the industry for a reason. But here's the thing: understanding how things work under the hood is very important. Building your own version, even a simplified one, is like taking apart a clock to see all the gears spin. It's a great learning experience that will sharpen your skills, and deepen your understanding of data structures, and honestly, it's just plain fun.
Think of this as your coding playground. We're going to strip things down to the essentials, focus on the core principles, and build something truly cool and functional. We won't be tackling every single feature of Redis (that's a monumental task for another day, or maybe never), but we'll nail the fundamentals: blazing-fast key-value storage, retrieval, and manipulation.
Why In-Memory Databases are the Speed Demons of the Data World
Before we start the coding, let's quickly recap why in-memory databases are such a big deal. In the world of data, speed is king. Traditional disk-based databases are fantastic for persistence and handling massive datasets, but they have a bottleneck: disk I/O. Reading and writing to disk is significantly slower than RAM access.
In-memory databases, like our target Redis, ditch the disk for primary operations and store everything in RAM (Random Access Memory). RAM is incredibly fast and more expensive, allowing for read and write operations in nanoseconds. This makes in-memory databases ideal for scenarios where latency is critical:
- Caching: Slap an in-memory DB in front of your slower disk-based database, and suddenly your applications are responding at warp speed. Frequently accessed data lives in RAM, served up in a blink.
- Session Management: For web applications, storing user session data in memory means lightning-fast access and seamless user experiences.
- Real-time Analytics: Need to process and analyze data as it streams in? In-memory databases can handle the velocity, providing insights in real-time.
- Leaderboards and Gaming: High-score leaderboards, real-time game states – anything that demands rapid updates and retrievals thrives on in-memory speed.
Our Tech Stack: TypeScript and the Power of Types
We're going full TypeScript for this adventure. Why TypeScript? Because in a project like this, where we're building complex data structures and operations, strong typing is our best friend. TypeScript catches errors early, makes our code more readable, and provides excellent auto-complete. It's like having a coding co-pilot constantly watching your back, making sure you don't accidentally mix apples and oranges.
We'll leverage TypeScript's type system to define our data structures precisely and ensure data integrity. We'll use interfaces, types, and generics to create robust and maintainable code. No more guessing about data types – TypeScript will enforce them, making our lives as developers much smoother.
The Heart of Our Redis: The Key-Value Store
At its core, Redis is a key-value store. You give it a key, and it associates it with a value. Think of it like a giant hash map or dictionary. The magic is in how efficiently it performs these operations.
To build our Redis, we'll need a data structure that can handle key-value pairs with blazing speed. The perfect candidate? You guessed it – a Hash Table (or Hash Map).
Building Our Hash Table in TypeScript
Let's start by crafting our own Hash Table class in TypeScript. This will be the foundational building block of our in-memory database.
interface HashTableEntry<ValueType> {
key: string;
value: ValueType;
next: HashTableEntry<ValueType> | null; // For collision handling (separate chaining)
// don't know what collision is ? don't worry we will talk about it in a later article.
}
export class HashTable<ValueType> {
private table: (HashTableEntry<ValueType> | null)[];
private size: number;
private capacity: number;
constructor(capacity: number = 16) {
this.capacity = capacity;
this.table = new Array(capacity).fill(null); // Initialize table with nulls
this.size = 0;
}
private hash(key: string): number {
// Initializes a variable to store the hash value.
let hashValue = 0;
// Iterates through each character of the input key.
for (let i = 0; i < key.length; i++) {
// Calculates the hash value using a polynomial rolling hash function.
// It multiplies the current hash value by 31 (a prime number) and adds the ASCII code of the current character.
// The result is then taken modulo the capacity of the hash table to keep it within the table's bounds.
hashValue = (hashValue * 31 + key.charCodeAt(i)) % this.capacity; // Simple polynomial rolling hash
}
// Returns the calculated hash value.
return hashValue;
}
public set(key: string, value: ValueType): void {
// Calculate the index in the hash table using the hash function.
const index = this.hash(key);
// Create a new hash table entry with the given key and value.
const entry: HashTableEntry<ValueType> = { key, value, next: null };
// Check if the slot at the calculated index is empty.
if (!this.table[index]) {
// No collision, the slot is empty. Insert the new entry directly.
this.table[index] = entry;
} else {
// Collision! Handle it using separate chaining (linked list).
let currentEntry = this.table[index];
// Traverse the linked list at the given index.
while (currentEntry) {
// Check if the key already exists in the linked list.
if (currentEntry.key === key) {
// Key already exists, update the value and return.
currentEntry.value = value;
return;
}
// Check if we've reached the end of the linked list.
if (!currentEntry.next) {
// We're at the end, append the new entry to the chain.
currentEntry.next = entry;
break;
}
// Move to the next entry in the linked list.
currentEntry = currentEntry.next;
}
}
// Increment the size of the hash table.
this.size++;
}
public get(key: string): ValueType | undefined {
// Calculate the index in the hash table using the hash function.
const index = this.hash(key);
// Get the first entry at the calculated index (head of the linked list).
let currentEntry = this.table[index];
// Traverse the linked list (if it exists) at the given index.
while (currentEntry) {
// Check if the current entry's key matches the search key.
if (currentEntry.key === key) {
// Key found! Return the associated value.
return currentEntry.value; // Key found!
}
// Move to the next entry in the linked list.
currentEntry = currentEntry.next;
}
// Key not found in the hash table.
return undefined; // Key not found
}
public delete(key: string): boolean {
// Calculate the index in the hash table using the hash function.
const index = this.hash(key);
// Get the first entry at the calculated index (head of the linked list).
let currentEntry = this.table[index];
// Initialize a variable to track the previous entry in the linked list (for deletion).
let previousEntry: HashTableEntry<ValueType> | null = null;
// Traverse the linked list (if it exists) at the given index.
while (currentEntry) {
// Check if the current entry's key matches the key to be deleted.
if (currentEntry.key === key) {
// Key found! Now handle the deletion.
// If there's a previous entry, we're not deleting the head of the list.
if (previousEntry) {
// Bypass the current entry by linking the previous entry to the next entry.
previousEntry.next = currentEntry.next; // Remove from chain
} else {
// We're deleting the head of the linked list.
this.table[index] = currentEntry.next; // Remove head of chain
}
// Decrement the size of the hash table.
this.size--;
// Key successfully deleted.
return true; // Key deleted
}
// Move to the next entry in the linked list, updating the 'previousEntry' as we go.
previousEntry = currentEntry;
currentEntry = currentEntry.next;
}
// Key not found in the hash table.
return false; // Key not found
}
public has(key: string): boolean {
return this.get(key) !== undefined;
}
public getSize(): number {
return this.size;
}
public getCapacity(): number {
return this.capacity;
}
public clear(): void {
this.table = new Array(this.capacity).fill(null);
this.size = 0;
}
}
Let's break down this HashTable
class:
HashTableEntry<ValueType>
Interface: Defines the structure of each entry in our hash table. It holds thekey
,value
, and anext
pointer. Thenext
pointer is crucial for handling collisions – when two keys hash to the same index. We're using separate chaining to resolve collisions. Each slot in the table can be the head of a linked list of entries.table: (HashTableEntry<ValueType> | null)[]
: This is the actual hash table itself. It's an array where each element is either aHashTableEntry
ornull
. The index in this array is determined by the hash function.size: number
andcapacity: number
:size
tracks the number of entries in the hash table, andcapacity
defines the initial size of the underlying array.hash(key: string): number
: This is our hash function. It takes a key (string) and converts it into an index within the bounds of ourtable
array. We're using a simple polynomial rolling hash here. A good hash function is crucial for distributing keys evenly across the table and minimizing collisions.set(key: string, value: ValueType): void
: Inserts a key-value pair into the hash table.- It calculates the hash index.
- If the slot at that index is empty, it creates a new entry and places it there.
- If there's a collision (the slot is occupied), it traverses the linked list at that slot. If the key already exists, it updates the value. Otherwise, it adds the new entry to the end of the linked list.
get(key: string): ValueType | undefined
: Retrieves a value associated with a key.- Calculates the hash index.
- Traverses the linked list at that index.
- If the key is found, it returns the corresponding value.
- If the key is not found, it returns
undefined
.
delete(key: string): boolean
: Removes a key-value pair from the hash table.- Calculates the hash index.
- Traverses the linked list.
- If the key is found, it removes the entry from the linked list and returns
true
. - If the key is not found, it returns
false
.
has(key: string): boolean
: Checks if a key exists in the hash table.getSize(): number
: Returns the number of entries in the table.getCapacity(): number
: Returns the current capacity of the table.clear(): void
: Empties the hash table.
Crafting Our In-Memory Database Class: InMemoryDB
Now that we have our HashTable
foundation, we can build our InMemoryDB
class, which will mimic the basic operations of Redis.
import { HashTable } from "./hash-table"; // Assuming HashTable is in a separate file
export class InMemoryDB {
private data: HashTable<string>; // Store string values for simplicity initially
constructor() {
this.data = new HashTable<string>();
}
public set(key: string, value: string): string {
this.data.set(key, value);
return "OK"; // Mimic Redis "OK" response
}
public get(key: string): string | null {
const value = this.data.get(key);
return value !== undefined ? value : null; // Mimic Redis null response
}
public del(key: string): number {
return this.data.delete(key) ? 1 : 0; // Mimic Redis 1 or 0 response
}
public exists(key: string): number {
return this.data.has(key) ? 1 : 0; // Mimic Redis 1 or 0 response
}
public ping(): string {
return "PONG"; // Because every good database needs a ping
}
public keys(): string[] {
// Not the most efficient for large datasets, but illustrative
const allKeys: string[] = [];
for (let i = 0; i < this.data.getCapacity(); i++) {
let currentEntry = this.data["table"][i]; // Direct access for simplicity (not ideal in production)
while (currentEntry) {
allKeys.push(currentEntry.key);
currentEntry = currentEntry.next;
}
}
return allKeys;
}
public clear(): void {
this.data.clear();
}
public getSize(): number {
return this.data.getSize();
}
}
Let's dissect the InMemoryDB
class:
data: HashTable<string>
: This is where we instantiate ourHashTable
class. We're initially storing string values (HashTable<string>
) for simplicity. Later, we can explore handling more complex data types.constructor()
: Initializes theInMemoryDB
by creating a newHashTable
instance.set(key: string, value: string): string
: This is our equivalent of the RedisSET
command. It takes a key and a value (both strings in this version) and uses ourHashTable
'sset
method to store the key-value pair. It returns"OK"
to mimic Redis's successful response.get(key: string): string | null
: OurGET
command. It retrieves the value associated with a key using theHashTable
'sget
method. If the key exists, it returns the string value; otherwise, it returnsnull
(again, mimicking Redis behavior).del(key: string): number
: OurDEL
command. It removes a key-value pair using theHashTable
'sdelete
method. It returns1
if the key was deleted, and0
if the key was not found (Redis-style response).exists(key: string): number
: OurEXISTS
command. Checks if a key exists in the database usingHashTable
'shas
method and returns1
or0
accordingly.ping(): string
: The classicPING
command. Returns"PONG"
– a simple way to check if the database is alive and kicking.keys(): string[]
: A basic implementation of aKEYS
command (though in real Redis,KEYS
can be inefficient on large datasets and is often discouraged in production). This iterates through the hash table to collect all keys. Note: This direct access tothis.data["table"]
is for illustrative purposes in this example. In a more robust implementation, you'd likely add a dedicated method toHashTable
to iterate over keys.clear(): void
: Clears all data in the database usingHashTable
'sclear
method.getSize(): number
: Returns the number of keys currently stored usingHashTable
'sgetSize
.
Using Our In-Memory Database
Let's see our InMemoryDB
in action!
import { InMemoryDB } from "./in-memory-db"; // Assuming InMemoryDB is in in-memory-db.ts
const db = new InMemoryDB();
db.set("name", "Alice");
db.set("age", "30");
db.set("city", "Wonderland");
console.log("Get name:", db.get("name")); // Output: Get name: Alice
console.log("Get age:", db.get("age")); // Output: Get age: 30
console.log("Get city:", db.get("city")); // Output: Get city: Wonderland
console.log("Get occupation:", db.get("occupation")); // Output: Get occupation: null (key not found)
console.log("Exists name:", db.exists("name")); // Output: Exists name: 1
console.log("Exists occupation:", db.exists("occupation")); // Output: Exists occupation: 0
console.log("Keys:", db.keys()); // Output: Keys: [ 'city', 'age', 'name' ] (order may vary)
console.log("Size:", db.getSize()); // Output: Size: 3
db.del("age");
console.log("Deleted age. Size now:", db.getSize()); // Output: Deleted age. Size now: 2
console.log("Get age after delete:", db.get("age")); // Output: Get age after delete: null
console.log("Ping:", db.ping()); // Output: Ping: PONG
db.clear();
console.log("Cleared DB. Size now:", db.getSize()); // Output: Cleared DB. Size now: 0
Run this code (assuming you've saved hash-table.ts
and in-memory-db.ts
in the same directory), and you'll see our basic Redis-like database functioning! You can set keys, get values, delete keys, check for existence, and even get a "PONG" response.
Expanding Our Redis: Beyond Basic Key-Value Pairs
Our current InMemoryDB
is a great starting point, but real Redis is much more versatile. It supports various data structures beyond simple key-value strings, such as:
- Lists: Ordered collections of strings. Think of them like arrays that you can push to, pop from, and manipulate efficiently. Redis lists are often used for queues and job processing.
- Sets: Unordered collections of unique strings. Sets are fantastic for membership testing, finding intersections and unions of collections.
- Hashes: Key-value maps within a key. Hashes are useful for representing objects with multiple fields.
- Sorted Sets: Sets where each member is associated with a score. Sorted sets are perfect for leaderboards, priority queues, and range-based queries.
To extend our InMemoryDB
, we could create separate classes to handle each of these data structures and integrate them into our main InMemoryDB
class. For example, we might add methods like lpush
, rpop
for lists, sadd
, smembers
for sets, hset
, hgetall
for hashes, and zadd
, zrange
for sorted sets.
Persistence: Making Our Data Last (Optional for In-Memory, but Important to Consider)
Currently, our InMemoryDB
is purely in-memory. If you shut down your application, all the data vanishes into the ether. Real Redis offers persistence mechanisms to save data to disk, ensuring durability:
- RDB (Redis DataBase): Point-in-time snapshots of the entire dataset are periodically saved to disk in a binary file. RDB is compact and fast for recovery, but you might lose data in case of a sudden crash between snapshots.
- AOF (Append Only File): Every write operation is appended to a log file. AOF provides better durability as you can recover data up to the last write operation. AOF can be configured for different levels of durability (e.g., fsync every write or every second).
Adding persistence to our InMemoryDB
would involve implementing one or both of these mechanisms. For RDB, we'd need to serialize our HashTable
to a file periodically. For AOF, we'd need to log every set
, del
, etc., operation to a file and replay the log on startup. Persistence adds complexity but is crucial for many real-world applications.
Concurrency and Thread Safety (Important for Real-World Redis)
Our current InMemoryDB
implementation is not inherently thread-safe. If multiple threads or processes try to access and modify the HashTable
concurrently, you could run into data corruption and race conditions. Real Redis is designed to be single-threaded for core operations, using an event loop to handle many client connections efficiently. However, it does offer features like Lua scripting and modules that can introduce concurrency.
For a production-ready in-memory database, you would need to address concurrency. Strategies include:
-
Single-threaded with Event Loop: Like Redis, use a single thread and an event loop to handle all client requests in a non-blocking manner. This simplifies concurrency management but limits CPU core utilization for single operations.
-
Multi-threading with Locking/Synchronization: Implement locking mechanisms (mutexes, semaphores) to protect shared data structures (like our
HashTable
) when accessed by multiple threads. This adds complexity but can leverage multi-core CPUs better for certain workloads.
Error Handling and Robustness (Beyond Basic Returns)
Our current error handling is very basic (returning null
or 0
). In a production database, you'd need much more robust error handling:
- Specific Error Types: Define custom error types to represent different failure scenarios (e.g.,
KeyNotFoundError
,InvalidCommandError
,OutOfMemoryError
). - Error Responses: Return informative error messages to clients when things go wrong, helping them understand and handle errors gracefully.
- Input Validation: Thoroughly validate all input from clients to prevent injection attacks and unexpected behavior.
- Resource Management: Handle resource limits (memory usage, connection limits) gracefully and prevent resource exhaustion.
Performance Considerations and Optimizations (Always Thinking About Speed)
Performance is very important for in-memory databases. We've used a hash table, which offers average O(1) complexity for set
, get
, and delete
operations, which is excellent. However, there are always ways to optimize further:
- Hash Function Choice: Experiment with different hash functions to find one that provides better key distribution for your typical data and minimizes collisions.
- Load Factor and Resizing: Implement dynamic resizing of the hash table. When the load factor (ratio of entries to capacity) exceeds a certain threshold, resize the table to maintain good performance.
- Memory Management: Be mindful of memory usage. For very large datasets, consider memory-efficient data structures and potentially off-heap storage (though that blurs the line with "in-memory").
The Journey Continues: From Basic to Brilliant
Congratulations! You've built a functional in-memory key-value database from scratch using TypeScript. It's a simplified version, but it embodies the core principles of Redis-like systems: speed, efficiency, and in-memory data handling.
This is just the beginning. You can expand this project in countless directions:
- Implement more Redis commands: Add commands for lists, sets, hashes, sorted sets, pub/sub, transactions, and more.
- Add persistence: Implement RDB and/or AOF persistence to make your database durable.
- Improve concurrency: Explore single-threaded event loop or multi-threading approaches to handle concurrent client requests safely and efficiently.
- Enhance error handling: Implement robust error handling, input validation, and resource management.
- Optimize performance: Benchmark, profile, and optimize your code for speed and memory efficiency.
- Build a client-server architecture: Turn your in-memory database into a server that clients can connect to over a network protocol (like the Redis protocol).
This article was written by Ahmad Adel. Ahmad is a freelance writer and also a backend developer
Related articles
-
How to enable Remote access to your MariaDB/MySQL database on Ubuntu Focal / MariaDB > v10.6
In this article we show how you can easily enable remote access to a new database in Webdock..
Last updated: November 10th 2022
-
How to enable Remote access to your MariaDB/MySQL database on Ubuntu Bionic or MariaDB < v10.6
In this article we show how you can easily enable remote access to a new database in Webdock or alternatively how to perform the process manually if you need fine-grained access control
Last updated: December 28th 2022
-
How to back up your MariaDB/MySQL database
An article on backing up your database and enabling binary log.
Last updated: December 6th 2022
-
A Step-by-Step Guide to Installing PostgreSQL on Ubuntu
A detailed guide on how to install PostgreSQL database server on Ubuntu
Last updated: March 6th 2024
-
Beginner's Guide: SQL Index: Why and How?
An intro for beginners on what SQL index is, and why use it.
Last updated: February 3rd 2025
-
Beginner's Guide: Clustered vs Non-clustered Index
Short article on Clustered and Non-clustered indexes
Last updated: February 3rd 2025
-
Analyze SQL queries with the Query Planner for PostgreSQL
An article on analyzing SQL queries with the query planner
Last updated: February 5th 2025
-
Database Partitioning in PostgreSQL
Instructions on creating partitions in PostgreSQL
Last updated: February 5th 2025
-
Rate Limiting with Redis and Node.js: Under the Hood
Rate-limiting with Redis and NodeJS
Last updated: February 10th 2025