The original version of this article was first published on IBM
developerWorks, and is property of Westtech Information Services. This
document is an updated version of the original article, and contains
various improvements made by the Gentoo Linux Documentation team.
This document is not actively maintained.
Getting the most out of swap
When you set up a brand new Linux server, do you create a single 128 MB swap
partition? If so, did you know that you are severely limiting swap performance?
Would you like to increase swap performance by several orders of magnitude, and
to create swap partitions larger than 1 GB? It's possible, requiring no kernel
patches or special hardware, just pure geek know-how!
Some of you may not really care about swap. After all, Linux systems are
typically very memory efficient, and swap is often barely touched. While often
true on desktop systems, servers are another story. Because servers may handle
unexpected stresses, such as runaway processes, denial of service attacks, or
even the Slashdot effect, they need to have adequate high-speed swap so that
they do not grind to a halt and possibly crash when all physical memory (and
then some) is exhausted.
Still not convinced that this is a big deal? I'll show you how easy it is to
bring down a server by launching a massive amount of new processes.
Please, if you try this, do it only on a non-production server that you
Let's say you have two customized grep commands in
/usr/bin, called bobgrep and jimgrep. Now, let's
assume that bobgrep is simply a shell script that calls the ELF
executable jimgrep, as follows:
Code Listing 1.1: The bobgrep script
jimgrep -r $*
Everything looks good so far, but what happens if jimgrep gets
accidentally replaced with a symbolic link to bobgrep? Well, in that
case, calling bobgrep or jimgrep will cause an infinite loop,
causing hundreds of bash processes to be spawned in mere seconds. This actually
happened to me once, and believe me, it hurt!
If a server doesn't have adequate swap, a situation like this can cause the
machine to lock up in much less than a minute. How do we fix the problem? One
way is to increase the swap size beyond 128 MB. Fortunately for us, there is
absolutely no 128 MB limit on swap under Linux 2.2.x+ and later kernels, as
there was in the past. The current limits are approximately 2 GB on x86,
PowerPC, and MC680x0 systems, 512 MB on MIPS systems, 128 GB on Alpha, and a
whopping 3 terabytes on UltraSparc platforms!
Code Listing 1.2: Swap partition size limits under modern Linux kernels
x86 2 Gigabytes
PowerPC 2 Gigabytes
Motorola 680x0 2 Gigabytes
Sparc 1 Gigabyte
MIPS 512 Megabytes
Alpha 128 Gigabytes
UltraSparc 3 Terabytes
While it's nice to be able to increase swap partition size to beyond 128 MB,
how about increasing performance? Ideally, it would be nice if we could set up
swap partitions in a RAID 0 stripe, so that reads and writes are equally
distributed between all partitions. If these partitions are on separate drives
and/or controllers, this will multiply swap file performance, allowing your
servers to handle temporary memory usage "spikes" without getting dramatically
Amazingly, all modern Linux kernels, by default (with no special kernel options
or patches) allow you to parallelize swap, just like a RAID 0 stripe. By using
the pri option in /etc/fstab to set multiple swap
partitions to the same priority, we tell Linux to use them in parallel:
Code Listing 1.3: Set multiple swap partitions to the same priority
/dev/sda2 none swap sw,pri=3 0 0
/dev/sdb2 none swap sw,pri=3 0 0
/dev/sdc2 none swap sw,pri=3 0 0
/dev/sdd2 none swap sw,pri=1 0 0
In the above example, Linux will use swap partitions sda2, sdb2, and sdc2 in
parallel. Since these partitions are on different drives, and possibly even
different SCSI controllers, read and write throughput will nearly triple. The
fourth swap partition, sdd2, will be used only after the first three partitions
have been exhausted.
The pri option is really easy to use. The priority must be a number
between 0 and 32767, with 32767 being the highest priority. The swap partitions
will be used from highest priority to lowest priority, meaning that a partition
with a priority of x will only be used only if all partitions with a priority
greater than x are already full. If several partitions have the same priority,
Linux will automatically parallelize access between them. This allows you to
not only parallelize swap, but also prioritize access so that the partitions on
the fastest drives (or regions of the drives) are used first. So, you can set
up an emergency swap partition on an old, slower drive that will be used only
if all high-speed swap is exhausted first.
Now it's time to put some of this swapping knowledge into action. To loosely
quote Mr. Miyagi of Karate Kid fame: "Swap on, swap off, geek-san!"
About the author
Daniel Robbins lives in Albuquerque, New Mexico. He was the President/CEO of
Gentoo Technologies Inc., the Chief Architect of the Gentoo Project and is a
contributing author of several books published by MacMillan: Caldera OpenLinux
Unleashed, SuSE Linux Unleashed, and Samba Unleashed. Daniel has been involved
with computers in some fashion since the second grade when he was first exposed
to the Logo programming language and a potentially lethal dose of Pac Man. This
probably explains why he has since served as a Lead Graphic Artist at SONY
Electronic Publishing/Psygnosis. Daniel enjoys spending time with his wife Mary
and his new baby daughter, Hadassah. You can contact Daniel at