Debug disk performance with iostat and iotop

Introduction

Your application just ground to a halt. Pages load like molasses, but CPU and memory look normal. This is the time to check if all's good with the disks.

top or htop is the most used tool when I/O trouble hits. But these are limited and won't show you the bottlenecks. What you need instead is iostat and iotop. Both tools ship with every Linux distribution.

I've used these tools to track down everything: runaway backup scripts or acting-up databases. They're not fancy, but they work. You'll have concrete answers about your I/O problems in minutes, not hours.

So, in this article, I'll walk through these tools, how they differ from each other, and advanced ways you can use them to make sysadmin work a whole lot easier.

I/O Performance Basics

Your server's storage subsystem works differently than you might expect. While CPU and memory problems often announce themselves loudly, disk bottlenecks lurk in the shadows until performance drops off a cliff.

When applications request data from disk, they trigger a complex chain of operations through the Linux storage stack. The request travels from your application down through the file system layer, into the general block layer, and finally reaches your physical storage device. Each layer adds its own overhead and potential delays.

You'll notice I/O problems first as response time degradation. Applications start feeling sluggish even when CPU usage looks normal. Database queries that normally complete in milliseconds suddenly take seconds. Web pages load like they're coming through a dial-up connection.

The tricky part about I/O bottlenecks is how they hide behind other symptoms. Your system might show high load averages while CPU utilization stays low. Processes spend their time waiting for disk operations instead of actually computing anything. This waiting state often gets misdiagnosed as a CPU problem when the real culprit lives in your storage layer.

Five Key Metrics For I/O Performance

Understanding I/O performance requires tracking five specific measurements that reveal what your disks are actually doing:

IOPS count read and write operations per second. A busy database might do 10,000 small operations while a file server handles 100 large ones. Different workloads need different IOPS patterns.
Throughput measures data movement in MB/s. Copying big files cares about throughput. Saving small database records cares about IOPS. Don't confuse the two.
Response time shows how long each operation takes. Fast throughput means nothing if every request takes 500ms to complete. Users notice latency more than raw speed.
Utilization percentage tells you how busy your disk stays. Above 80% usually means trouble, but some modern SSDs handle 100% utilization just fine.
Queue depth shows how many operations are waiting. Traditional spinning drives hate deep queues, while SSDs can juggle multiple requests without breaking a sweat.

Here's where most sysadmins get confused. You see high iowait and think "aha, disk bottleneck!" Not so fast.

iowait only measures CPU time spent waiting for I/O. If your system has other work to do, the CPU won't sit around waiting. It'll do something else. Your disk might be completely hammered while iowait shows zero.

I've seen backup scripts destroy storage performance while iowait stayed low because other processes kept the CPU busy. Don't trust iowait alone. Use it as one clue among many, not the smoking gun.

Modern multicore systems make this worse. One core waits for I/O while three others crunch numbers. The average looks fine, but your storage is not.

iostat v/s iotop

When your server starts dragging and you suspect disk problems, you need tools to show you what's happening. As mentioned, Linux gives you two solid options: iostat and iotop. But what's actually different between them? Well, it's not as simple as it seems:

iostat monitors your storage devices and reports statistics about their activity. It's part of the sysstat package and focuses on device-level metrics like throughput, utilization, and response times. It's more like a storage dashboard.

iotop works differently. It's a process monitor showing which programs actually use your disks. Like the regular top command but for I/O activity. It requires root access and displays real-time data about what each process is reading or writing.

These tools approach I/O monitoring from opposite angles. iostat gives you the big picture about your storage devices. iotop shows you the specific processes causing disk activity.

The tools often show different numbers, and that's normal. iostat might report 5% utilization while iotop reveals a process hammering your disk. They measure different things at different intervals. A process can spike for 200ms and disappear before iotop's next refresh, but iostat will catch that spike in its interval average.

iostat ships with the sysstat package alongside other monitoring tools. iotop comes as its own separate package. Both work at different system layers, which explains why their readings don't always match.

Getting Started with iostat

iostat ships with almost every Linux distribution as part of the sysstat package. If you are not sure, grab it with your package manager: apt install sysstat on Debian-based systems or yum install sysstat on RHEL variants.

The tool reads from /proc/diskstats and /proc/stat, which are kernel files that track storage and CPU activity. Unlike iotop's live process monitoring, iostat works with time-averaged data. This explains why the two tools sometimes disagree about what your disks are doing.

iostat has existed since the early Unix days. The interface feels basic compared to modern dashboards, but that simplicity helps when servers are failing and you need fast answers.

Running iostat

Type

$ iostat

And hit enter. You get a snapshot showing activity since boot time. Two sections appear: CPU percentages at the top, device statistics below.

Those initial numbers represent averages from system startup. A runaway process that hammered your disks three hours ago won't show up if the system has been quiet since. You need current data for real troubleshooting.

After the command for continuous updates, add a number:

$ iostat 5

It refreshes every five seconds. The first output still shows boot averages, but each refresh gives you live activity. Stop it with Ctrl+C.

You can also specify how many reports to generate:

$ iostat 5 12

Generates twelve reports at five-second intervals and then exits automatically.

Reading iostat Output

The default iostat output has two sections. The CPU section shows five percentages that add up to 100.

%user covers application runtime. %system handles kernel operations like filesystem calls. %iowait shows CPU time spent waiting for disk operations. %steal appears on VMs when the hypervisor borrows CPU cycles. %idle means completely unused processing time.

Don't assume high iowait equals storage problems. The CPU waits for disk operations on quiet systems because it has nothing else to do. Your storage might work fine, just slowly.

Conversely, zero iowait on busy systems often means the CPU is swamped with other tasks while I/O requests pile up.

The device section lists storage devices with six columns. Names match /dev/ entries. tps shows transfers per second (IOPS). kB_read/s and kB_wrtn/s display throughput. The final three columns show the total data that has moved since boot.

These basic metrics work for quick health checks. Serious diagnosis requires extended statistics.

Practical iostat Commands

Start with the basics.

$ iostat

shows you CPU and device stats since boot. Add a number like

$ iostat 2

and it refreshes every two seconds until you kill it. Want just five reports? Try

$ iostat 2 5

instead.

Skip that initial boot summary with

$ iostat -y 2

Boot stats rarely help when you're troubleshooting live problems anyway.

The -x flag opens up the more important stats.

$ iostat -x

gives you await times, service times, queue depths.

All the metrics that actually matter for spotting bottlenecks.

Combine it:

$ iostat -x 2

For live extended monitoring.

Target specific devices. iostat sda focuses on your main drive.

$ iostat sda sdb

Watches multiple drives.

$ iostat -p sda

breaks down by partition, which helps when one filesystem goes berserk.

Filter out the junk with

$ iostat -z

Loop devices and idle USB drives just clutter your output.

$ iostat -z -x 2

Shows only active devices with full stats.

Split CPU and device data.

$ iostat -c

gives just processor stats.

$ iostat -d

strips the CPU data.

Handy when you know the bottleneck lives in one subsystem.

Change units to match your data.

$ iostat -k

uses kilobytes per second.

$ iostat -m

switches to megabytes.

$ iostat -h

picks units automatically and looks cleaner.

Format for readability:

$ iostat -s

fits narrow terminal windows.

$ iostat -t

adds timestamps to each report.

Get JSON output with

$ iostat -o JSON

Perfect for scripts and monitoring dashboards.

LVM users need

$ iostat -N

It is much easier to understand which volumes are busy when you see actual logical volume names instead of cryptic dm-0 device names.

Use persistent device naming:

$ iostat -j UUID {ID}

Targets drives by UUID. Replace {ID} with actual UUID. Survives device enumeration changes after reboots. Use blkid to find your device UUIDs first.

Group related devices:

$ iostat -g Database sda1 sda2

Adds a summary row:

$ iostat -g Web nvme0n1 nvme1n1 -H

shows only the group total, not individual drives.
Real-world combinations work better:

$ iostat -x -z -t 1 gives

you live extended stats with timestamps for active devices only.

$ iostat -y -x sda 1

skips boot summary and monitors your main drive live.

$ iostat -d -k -z 2

focuses on device throughput in kilobytes, hiding idle stuff.

For stress testing,

$ iostat -x -z -t 1 300

runs extended monitoring for exactly five minutes.

$ iostat -p ALL -x 10

shows every partition with extended stats every ten seconds.

Mix these flags based on what you're hunting. Extended stats when you need detail. Filtering when output gets messy. Timestamps when correlating with other events. Units that match your mental model of the problem.

Getting Started with iotop

Unlike iostat, which ships by default, iotop requires a separate install. The tool runs on Python and needs specific kernel options enabled. Most modern Linux distributions include these by default, but older systems might need kernel recompilation.

Your distribution's package manager handles the installation. Run

$ sudo apt install iotop

On Debian and Ubuntu systems. On newer versions, RHEL, CentOS, and Fedora, you might need

$ sudo yum install iotop

$ sudo dnf install iotop

The tool requires root privileges for full functionality, so you'll run it with sudo.

iotop reads from kernel accounting files that track per-process I/O activity. This gives you real-time visibility into which processes hammer your disks. The interface updates live, showing current bandwidth usage rather than historical averages like iostat.

One key difference between the tools becomes clear immediately: iotop shows process-level data while iostat focuses on device statistics. When you spot high disk utilization with iostat, iotop tells you exactly which process causes the problem.

Using iotop

Start iotop by typing

$ sudo iotop

And hitting enter. The interface resembles the top but tracks disk activity instead of CPU usage. By default, it displays all processes and threads, including idle ones.

The display refreshes every second. Two summary lines at the top show total disk read and write rates across your system. Below that, individual processes appear with their I/O statistics.

Most troubleshooting sessions benefit from the –only flag:

$ sudo iotop --only

This filters the display to show just processes actually doing I/O work. You skip the clutter of idle processes and focus on the disk activity that matters.

For logging, switch to batch mode with

$ sudo iotop -b -n 10

This runs ten iterations and exits, perfect for cron jobs or scripts. Add timestamps with the -t flag when you need to correlate I/O spikes with other events.

The tool accepts multiple options simultaneously. Running

$ sudo iotop --only -P -d 5

Shows only active processes, groups threads by process name, and updates every five seconds instead of the default one-second interval.

Reading iotop Output

The iotop interface displays seven columns of data for each process. Understanding these columns helps you quickly identify problematic processes.

TID shows the thread ID, which might differ from the process ID for multithreaded applications. PRIO displays the I/O priority class and level. Most processes run with "be/4" meaning best-effort priority level 4.

USER indicates which account owns the process. DISK READ and DISK WRITE show current bandwidth in bytes per second. These numbers change constantly as processes perform I/O operations.

SWAPIN percentage reveals how much time the process spends waiting on swap operations. High numbers here indicate memory pressure problems rather than pure disk bottlenecks.

The IO column shows the percentage of time spent waiting for I/O operations to complete. This differs from the bandwidth columns because they measure wait time rather than data movement. A process might show low bandwidth but high IO percentage if it makes many small, slow requests.

COMMAND displays the process name and arguments. This helps identify which specific application or script creates the disk load.

Some Commands

Beyond the basic startup, iotop gives you plenty of ways to slice the data. Real troubleshooting needs targeted views that cut through system noise.

$ iotop --only

strips away idle processes and shows just the drives working right now. Much cleaner than scrolling through hundreds of sleeping processes during active investigations.

$ iotop -P --only

combines process grouping with active filtering. Threads from the same application get collapsed into single entries, making patterns easier to spot when databases or web servers hammer storage.

$ iotop -a --only

switches to accumulated I/O totals instead of per-second rates. Perfect for finding which processes moved the most data since you started monitoring, rather than just current activity spikes.

Batch mode works better for logging and automation.

$ iotop -b -n 10 -d 2

captures ten snapshots at two-second intervals and then exits. It is great for cron jobs that check I/O during specific time windows.

iotop -b -t --only

adds timestamps to each line in batch mode. Correlate I/O spikes with log entries or other system events when building timelines for postmortems.

Filter by user when you suspect specific accounts.

$ iotop -u nginx,mysql --only

watches just web server and database processes. If you are using Apache or any other service, just replace NGINX with your preferred one.

$ iotop -u root

catches system processes that might be misbehaving during maintenance windows.

Target individual processes with

$ iotop -p 1234,5678

Handy when you already know the PIDs from other monitoring tools and want detailed I/O breakdowns for specific applications.

$ iotop -k --only

forces kilobyte units instead of human-readable scaling. Consistent units help when parsing output with scripts or comparing across different time periods.

$ iotop -q --only

suppresses header lines for cleaner batch output. Add more q flags: -qq removes column names entirely, -qqq strips the I/O summary too. Perfect for log files where headers just add clutter.

Hide specific columns when terminal space gets tight:

$ iotop -1 -3 --only

removes PID and USER columns, focusing screen real estate on actual I/O numbers. Use -2 through -9 to hide other columns like priority or swap activity.

$ iotop -c --only

displays full command lines instead of just process names. Spot which specific scripts or database queries cause problems instead of just seeing generic interpreter names.

Control refresh timing precisely:

$ iotop -d 0.5 --only

updates twice per second for fast-moving situations.

$ iotop -d 10 --only

reduces overhead during long monitoring sessions where second-by-second updates aren't needed.

Combine flags strategically based on your investigation.

$ iotop -P -a -t -k --only

gives you process-grouped, accumulated totals with timestamps in consistent kilobyte units, showing only active processes. Your go-to command for thorough I/O analysis sessions.

Interactive shortcuts work once iotop starts running. Press o to toggle active-only mode, p to switch between processes and threads, a to flip between current and accumulated views. Arrow keys change sorting columns, r reverses sort order.

Interpreting Problems

Your server hits a wall. Both tools show you data, but translating that into actual problems takes experience. Numbers mean different things based on your hardware and what's actually broken.

Reading iostat Output

When await times spike while utilization stays low, your storage is slow, not busy. Drives might have tons of free space, but take forever to finish each request. Old drives often show this pattern before they die completely.

Ignore the svctm field entirely. The manual warns you it's broken on modern systems. Focus on await instead. That shows the real time your applications spend waiting for disk work to finish.

Utilization percentages need your hardware context. A spinning disk at 100% has hit its limit. An SSD at 100% might handle much more work because these drives process multiple requests at once. Utilization just means "doing something" rather than "completely jammed."

Queue depths matter more than most people realize. Traditional spinning drives perform poorly with more than 2-4 requests per disk because the read head jumps between locations. SSDs handle deeper queues fine, often up to 64 requests before slowdowns start.

Understanding iotop Data

The IO percentage creates constant confusion. A process at 99% IO isn't using 99% of disk speed. It means that the process spent 99% of its time waiting for disk operations instead of doing actual work.

Several processes can show 90% IO at the same time. Each process waits for different disk operations. The IO column shows per-process wait time, not total system disk consumption.

Good storage usage shows high bandwidth with low IO percentages. Lots of data moves without much waiting. Bad usage shows low bandwidth with high IO percentages. Usually means tons of small, random operations that make storage work poorly.

High SWAPIN percentages reveal memory problems disguised as disk issues. When processes show high SWAPIN, you need more RAM, not faster storage.

Common Problem Patterns

Database servers create specific signatures. They have high IOPS with small data blocks, lots of tiny reads and writes instead of big chunks, and MySQL might perform 5000 operations per second but only transfer 50MB total.

Broken backup scripts show completely different patterns. There is massive write bandwidth with low IOPS. One process consumes tremendous write speed in iotop. IO percentages usually stay reasonable because big sequential writes don't cause much waiting.

Fragmented filesystems create random access patterns even when applications want sequential data. High IOPS with terrible response times. File servers affected this way handle large transfers fine, but perform poorly on small file operations.

Virtual machines complicate everything. The hypervisor batches and reorders requests. VM disk activity doesn't match host iostat output. Guest and host metrics disagree constantly.

Hardware Makes the Difference

Spinning drives show predictable patterns. More IOPS means longer wait times. The read head can only occupy one position. Queue depths over 2-4 requests per disk create severe slowdowns.

SSDs break these old relationships. High IOPS don't automatically mean high wait times. Quality NVMe drives handle hundreds of operations with minimal delays. Some enterprise SSDs perform better under load because of internal parallel processing.

Network storage adds complexity. EBS volumes, SAN storage, and NFS include network delays in response times. Performance depends on network infrastructure as much as actual storage hardware. Cloud platforms often throttle IOPS before bandwidth limits kick in.

RAID configurations multiply the confusion. RAID 1 doubles write operations internally, but might not show this in iostat. RAID 5 and RAID 6 convert single write requests into multiple read-modify-write cycles. Software RAID presents different patterns than hardware RAID because the kernel observes different operation sequences.

Using Both Tools Together

Start with iostat to find problem devices, then switch to iotop to identify the processes causing issues. iostat might show /dev/sda with bad response times, while iotop reveals three processes hitting that device simultaneously.

Sometimes, iostat shows normal device stats, but applications feel sluggish. iotop often exposes the real cause. A single process might consume all available IOPS with tiny random reads, starving other processes. Device utilization looks fine, but response times suffer.

Timing differences between tools happen naturally. iostat averages data over intervals, while iotop shows current snapshots. Quick I/O bursts might not register in iostat's multi-second averages but dominate iotop's display. Run both tools at the same time to catch these brief spikes.

Memory pressure appears differently in each tool. iostat shows swap device activity, while iotop identifies which processes trigger swapping. Combined, they help distinguish genuine high-memory applications from memory leaks causing excessive swap usage.

Make Them Work for You

Disk problems used to mean hours of guesswork and random reboots. Now, you can use two tools that cut through the confusion in minutes. iostat shows you which devices are struggling, and iotop reveals which processes cause the damage.

Both tools provide immediate data without complex installation or configuration. They're already available on your system.

Skip the guesswork next time your disks act up!

Meet Aayush , a WordPress website designer with almost a decade of experience who crafts visually appealing websites and has a knack for writing engaging technology blogs. In his spare time, he enjoys illuminating the surrounding minds.

Debug disk performance with iostat and iotop

Introduction

I/O Performance Basics

Five Key Metrics For I/O Performance

iostat v/s iotop

Getting Started with iostat

Running iostat

Reading iostat Output

Practical iostat Commands

Getting Started with iotop

Using iotop

Reading iotop Output

Some Commands

Interpreting Problems

Reading iostat Output

Understanding iotop Data

Common Problem Patterns

Hardware Makes the Difference

Using Both Tools Together

Make Them Work for You

Related Articles

VMs vs Containers

What Happens When You Power-On a Linux Machine?

ps Command Explained – Track, debug, and manage Linux processes

Welcome to our Chatbox