Linux’s handling of out-of-memory scenarios has been a hot topic of discussion and overall point of critique for a long time now.
Many solutions for this problem were and are still being proposed both on the kernel level, with stuff like le9, and on the userland, with applications such as Facebook’s oomd, earlyoom and the one we’ll look into more closely in this article, bustd.
These class of userland daemons are usually called OOM (out of memory) killers.
What are OOM killers?⌗
TODO: include a description of what they are or remove this section entirely?
Knowing when memory is running out⌗
An OOM killer has to be able of telling how much memory is available on the system.
Even though most would consider “free” and “available” to be synonyms, these terms hold different meanings in the context of system memory in Linux.
Free memory is the memory totally unused by the operating system while available memory is an estimation of how much memory is available for use without swapping, taking in account page cache and reclaimable memory slabs.[1].
With those terms defined, how can we obtain them?
One of the ways of programatically getting the free memory of the system is through the sysinfo
syscall, through which we can obtain the following data:
struct sysinfo {
long uptime; /* Seconds since boot */
unsigned long loads[3]; /* 1, 5, and 15 minute load averages */
unsigned long totalram; /* Total usable main memory size */
unsigned long freeram; /* Available memory size */
unsigned long sharedram; /* Amount of shared memory */
unsigned long bufferram; /* Memory used by buffers */
unsigned long totalswap; /* Total swap space size */
unsigned long freeswap; /* Swap space still available */
unsigned short procs; /* Number of current processes */
char _f[22]; /* Pads structure to 64 bytes */
};
Now, the only way (in userland) to obtain the available memory figure is through the [proc
] filesystem.
$ head -4 /proc/meminfo
MemTotal: 20286260 kB
MemFree: 11082820 kB
MemAvailable: 14691740 kB
Buffers: 244 kB
There is a pro and a con for both. While parsing from /proc/meminfo
yields a much more accurate memory status, it is an order of magnitude slower than using the sysinfo
system call.
I opted for a slightly different approach for my OOM killer, bustd
.
bustd
uses sysinfo
for a very cheap, approximated view into the status of the system memory and then checks for Pressure Stall Information