Memory Footprint of Processes

The amount of memory your system needs depends on the memory
requirements of the programs you run. Do you want to know how
to figure that out? It’s not as simple as adding up the amount of
memory used by each process individually, because some of that
memory can be shared. Read on to learn the details below.

One thing you should know about /proc/meminfo: This is not a real file. Instead /pro/meminfo is a virtual file that contains real-time, dynamic information about the system.

System administrators want to understand the applications that run on their systems.
You can’t tune a machine unless you know what the machine is doing! It’s fairly easy to
monitor a machine’s physical resources: CPU ( mpstat , top ), memory ( vmstat ), disk
IO ( iotop , blktrace , blkiomon ) and network bandwidth ( ip , nettop ).

To answer these questions, and others, I describe extracting information from
the /proc filesystem. First, let’s look at terminology relevant to Linux memory
management. If you want an exhaustive look at memory management on Linux,
consider Mel Gorman’s seminal work Understanding the Linux Virtual Memory Manager.
His book is an oldie but goodie; the hardware he describes hasn’t changed much over the intervening years, and the changes that have occurred have been minor. This
means the concepts he describes and much of the code used to implement those
concepts is still spot-on.

Before going into the nuts and bolts of the answers to those questions, you first need
to understand the context in which those questions are answered. So let’s start with a
high-level overview.

Linux Memory Usage

Your computer system has some amount of physical RAM installed. RAM is needed
to run all software, because the CPU will fetch instructions and data from RAM and
nowhere else. When a system doesn’t have enough RAM to satisfy all processes, some
of the process memory is written to an external storage device and that RAM then
can be freed for use by other processes. This is called either swapping, when the RAM
being freed is anonymous memory (meaning that it isn’t associated with file data, such
as shared memory or a process’s heap space), or paging (which applies to things like
memory-mapped files).

(By the way, a process is simply an application that’s currently running. While the
application is executing, it has a current directory, user and group credentials, a list of
open files and network connections, and so on.)

Some types of memory don’t need to be written out before they can be freed and
reused. For example, the executable code of an application is stored in memory and
protected as read-only. Since it can never be changed, when Linux wants to use that
memory for something else, it just takes it! If the application ever needs that memory
back again, Linux can reload it from the original application executable on disk. Also,
since this memory is read-only, it can be used by multiple processes at the same time.
And, this is where the confusion comes in regarding calculating how much memory a
process is using—what if some of that memory is being shared with other processes?
How do you account for it?

Before getting to that, I need to define a few other terms. The first is pinned memory.

Most memory is pageable, meaning that it can be swapped or paged out when the
system is running low on RAM. But pinned memory is locked in place and can’t be
reused. This is obviously good for performance—the memory never can be taken
away, so you never have to wait for it to be brought back in. The problem is that such
memory can never be reused, even if the system is running critically low on RAM.
Pinned memory reduces the system’s flexibility when it comes to managing memory,
and no one likes to be boxed into a corner.

Simple Example

I made reference above to read-only memory, memory that is shared, memory used
for heap space, and so on. Below is some sample output that shows how memory is
being used by my Bash shell (I want to emphasize that this output has been trimmed
to fit into the allotted space, but all relevant fields are still represented. You can run
the two commands you see on your own system and look at real data, if you wish.
You’ll see full pathnames instead of “…” as shown below, for example):

Each line of output represents one vm_area. A vm_area is a data structure inside the
Linux kernel that keeps track of how one region of virtual memory is being used inside
a process. The sample output has /bin/bash on the first three lines, because Linux has
created three ranges of virtual memory that refer to the executable program. The first
region has permissions r-xp , because it is executable code (r = read, x = execute and
p = private; the dash means write permission is turned off). The second region refers
to read-only data within the application and has permissions r–p (the two dashes
represent write and execute permission).

The third region represents variables that have been given initial values in the
application’s source code, so it must be loaded from the executable, but it could
be changed during runtime (hence the permissions rw-p that shows only execute
is turned off). These regions can be any size, but they are made of up pages, which
are each 4K on Linux. The term page means the smallest allocatable unit of virtual
memory. (In technical documentation, you’ll see two other terms: frame and slot.
Frames and slots are the same size as pages, but frames refer to physical memory and
slots refer to swap space.)

You know from my previous discussion that read-only regions are shared with other
processes, so why does “p” show up in the permissions for the first region? Shouldn’t
it be a shared region? You have a good eye to spot that! Yes, it should. And in fact, it
is shared. The reason it shows up as “p” here is because there are actually 14 different
permissions and room only for four letters, so some condensing had to be done.
The “p” means private, because while the memory is currently marked read-only, the
application could change that permission and make it read-write, and if it did make
that change and then modified the memory, you would not want other processes
to see those changes! That would be similar to one process changing directory, and
every other process on the system changing at the same time! Oops! So the letter
“p” that marks the region as private really means copy-on-write. All of the memory starts out being shared among all processes using that region, but if any part of it
is modified in the future, that one tiny piece is copied into another part of RAM so
that the change applies only to the one process that attempted the write. In essence,
it’s private, even though 99% of the time, the memory in that region will be shared
with other processes. Such copying applies on a page-by-page basis, not the entire
vm_area. Now you can begin to see the difficulty in calculating how much memory a
process actually consumes.

But while I’m on this topic, there’s a region in the list that has an “s” in the permission
field. That region is a memory-mapped file, meaning that the data blocks on disk are
mapped to the virtual memory addresses shown in the listing. Any reference the
process makes to the memory addresses are translated automatically into reads and
writes to the corresponding data blocks on disk. The memory used by this region is
actually shared by all processes that map the file into memory, meaning no duplicated
memory for file access by those processes.

Just because a region represents some given size of virtual memory does not necessarily
mean that there are physical frames of RAM for every virtual page. In fact, this is often
the case. Imagine an application that allocates 100MB of memory. Should the operating
system actually allocate 100MB right then? UNIX systems do not, they allocate a region
of virtual memory like those above, but no physical RAM. As the process tries to access
those virtual addresses, page faults will be generated, and the operating system will
allocate the memory at that time. Deferring memory allocation until the last possible
moment is one way that Linux optimizes the use of memory, but it complicates the task
in trying to determine how much memory an application is using.

Recap So Far

process’s address space is broken up into regions called vm_areas . These vm_areas
are unique to each process, but the frames of memory referred to by the pages
within the vm_area might be shared across processes. If the memory is read-only
(like executable code), all processes share the frame equally. Any attempt to write to
virtual pages that are read-only triggers a page fault that is converted into a SIGSEGV
and the process is killed.

(You may have seen the message pop up on your terminal screen, “Segmentation fault.” That means the process was killed by SIGSEGV.)

Memory that is read/write also can be shared, such as shared memory. If multiple
processes can write to the frames of the vm_area equally, some form of
synchronization inside the application will be necessary, or multiple processes could
write at the same time, possibly corrupting the contents of that shared memory.
(Most applications will use some kind of mutex lock for this, but synchronization and
locking is outside the scope of this article.)

Adding Up the Memory Actually Used

So, determining how much memory a process consumes is difficult. You could add
up the space allocated to the vm_areas , but that’s virtual memory, not physical; large
portions of that space could be unused or swapped out. This number is not a true
representation of the amount of memory being used by the process.

You could add up only the frames that are used by this process and not shared. (This
information is available in /proc/pid/smaps.) You might call this the “USS” (Unique Set
Size), as it defines how much memory will be freed when an application terminates
(shared libraries typically stay in RAM even when no processes are currently using
them as a performance optimization for when they are needed again). But this isn’t
the true memory cost of a process either, as the process likely uses one or more
shared libraries. For example, if an application is executed and it uses a shared library
that isn’t already in memory, that library must be loaded—some part of that library
should be allocated against the new process, right?

The ps command reports the “RSS” (Resident Set Size), which includes all frames
used by the process, regardless of whether they’re shared. Unfortunately, this number
is going to inflate the memory size when all processes are summed up—adding up this
number for all processes running on the system will count all shared libraries multiple
times, greatly inflating the actual memory requirement.
The /proc/pid/smaps file includes yet another memory category, PSS (Proportional Set
Size). This is the amount of unique memory just for one process (the USS), plus a proportion of the memory that is shared by other running processes. For example, let’s
assume the USS for a process is 2MB and it uses another 4MB of shared libraries, but
those shared libraries are used by three other processes. Since there are four processes
using the shared libraries, they should each only be accounted for 25% of the overall
library size. That would make the PSS of the process 2MB + (4MB / 4) = 3MB. If you now
add together the PSS values of all processes on the system, the shared library memory
will be totally accounted for, meaning the whole is equal to the sum of its parts.

It’s not perfect—when one of those processes terminates, the memory returned to
the system will be USS, and because there’s one less process using the shared libraries,
the PSS of all other processes will appear to increase! A naïve system administrator
might wonder why the memory usage on the remaining processes has suddenly
spiked, but in truth, it hasn’t. In this example, 4MB/4 becomes 4MB/3, so any process
using the shared libraries will see an adjusted PSS value that goes up by .33MB.

As the last step, I’m going to demonstrate a command that performs these calculations.

Automating the Work
The one-line command shown below will accumulate all of the PSS values for all
processes on the system:

awk ‘/^Pss:/ { ttl += $2 }; END { print ttl }’ /proc/[0-9]*/smaps 2>/dev/null

Note that stderr is redirected to /dev/null. This is because the shell replaces the wildcard
string with a list of all filenames that match and then executes the awk command. This
means that by the time the awk command is running, some of those processes already may
have terminated. That will cause awk to print an error message about a non-existent file,
hence redirecting stderr to avoid that error. (Astute readers will note that this command
line will never factor in the memory consumed by the awk command itself!)
Many of the processes that the awk command is going to be reading will not be accessible
to an unprivileged account, so system administrators should consider using sudo to run the command.

(Inaccessible processes will produce error messages that are then redirected to
/dev/null, thus the command will report a total of the memory used by all processes that are
accessible—in other words, those owned by the current user.)

 

Summary

I’ve covered a lot of ground in this blog article, from terminology (pages, frames, slots)
and background information on how virtual memory is organized (vm_areas), to
details on how memory usage is reported to userspace (the maps and smaps files
under /proc). I’ve barely scratched the surface of the type of information that the
Linux kernel exposes to userspace, but hopefully, this has piqued your interest enough
that you’ll explore it further.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.