Tackling a resource leak on an application server

Abstract

Memory leak is one of the most concerning problems in C++ programming. Entry level developers are in the group of people most prone to this problem. In some cases, even an experience C++ programmer could be troubled. Memory management is a tricky subject. This paper discusses the author’s experience in detecting one and his proposed solution. References are made to a book by Scott Mayer, “Effective C++”.

1. Introduction

A customer reported continuous increase of memory usage in one of the application server. The server was written by using a proprietary C++ framework. The memory usage had increased to a level where it was no longer feasible for the UNIX OS to continue servicing processes. Intensive disk activity was recorded due to heavy memory paging. The host machine was experiencing trashing due to multiple application server processes were running in it.

2. Detecting the source of leak

The customer was interviewed to find out which function was used most during the time the memory usage increased was detected. As a result, a function was identified. This function was used as the starting point for a memory usage probe.

2.1 Validate the leak

In a UNIX environment, it is possible to see a memory increase even after a function finished execution in a process space. The free() function (or the delete operator in C++ who in turn calls free()) does not necessarily reduce the memory usage of a process. This led to a need to validate if the memory increase experienced by the customer is indeed a genuine software bug – or a memory/resource leak.

The following paragraph discusses commands specific to SunOS 5.9 SPARC.

There are a few ways to inspect the characteristics of a running process. Besides vmstat and prstat, one of the most commonly used commands is the process status command, ps. However, there is a large number of arguments supported in this command. A UNIX ‘man ps’ showed a comprehensive list of arguments and their meanings.

An appropriate list of arguments had been chosen for the purpose of memory leak detection. They are (reproduced from Solaris man page):

  • -e Lists information about every process now running.
  • -f Generates a full listing.
  • -o Formats, which includes
    • pid – process id
    • pmem – The ratio of the process's resident set size to the physical memory on the machine, expressed as a percentage.
    • rss – The resident set size of the process, in kilobytes.
    • vsz – The total size of the process in virtual memory, in kilobytes.
    • pcpu – The ratio of CPU time used recently to CPU time available in the same period, expressed as a percentage.
    • comm – The name of the command being executed (argv[0] value) as a string.

The full command used to monitor the memory usage of the target program is as follow.

ps –ef –o pid,pmem,rss,vsz,pcpu,comm | grep leaker | grep –v grep

For continuous monitoring, the above command line was included in a shell script that runs in a loop at a fix interval over and over again. A UNIX ‘tee’ command was used to pipe the screen output to a file so that the log can be inspected later. It was a primitive tool, nevertheless it did the job.

The full memory monitoring script is as follow.

#!/bin/csh
echo “  PID PMEM RSS  VSZ   PCPU COMM”
while 1
    ps –ef –o pid,pmem,rss,vsz,pcpu,comm | grep leaker | grep –v grep
sleep 2
end

Due to the absence of a real application server at the time of this writing, a short program that deliberately leaks memory was created to demonstrate the capability of the above script, along with other shell commands such as prstat and vmstat.


Figure 1: A tiny memory leaker program

The program was compiled with a gcc c++ compiler and executed from the UNIX command shell.


Figure 2: Compiling and running the leaker program

This tiny program leaked 8 bytes every time an instance of type A was created. 4 bytes for the character buffer (_buff[4]) and 4 bytes for the pointer to v-table because of virtual destructor.

To see how the memory leak monitor script performed, the “leaker” program was executed and then the script was launched.

Figure 3: Increase of RSS after 2.25 million copies of A were leaked

After approximately 2.25 million copies of object A, the memory kept increasing at a constant rate. The prstat command shows the final RSS at 36 MB, increased from initial 2.2 MB.
When using vmstat to monitor the process information, continuous execution of the leaker program shows that the paging out (po) activity was increasing as a result of physical memory outage.


Figure 4: Page Out (po) occurs at about 14.2 million copies of ‘A’ leakage

2.2 Concluding the memory increase as a form of memory leak

As mentioned earlier, memory increase in running process does not necessarily signify a memory leak situation. A non-leak memory increase will stabilize over a period of time.

The report generated from the script showed that there was a continuous memory usage increase over the repetitive execution period.
Based on the report, program codes were reviewed carefully. As pointed out in Scott Meyer’s Effective C++, there is a danger of returning a pointer from a function.

CDateTime *dt = CDateTime::now();

The above statement is deceptive in the sense that it does not explicitly show the usage of operator ‘new’. It will skip the eyes of reviewers that are trying to look for ‘new’ operator without a ‘delete’ as a sign of memory leak.
The statement was suspicious. It either causes a leak, or it is not. There are classes in the framework that can be wrapped in a smart pointer envelop. These types of classes release the burden of worrying about memory leaks from the developers. However, classes wrapped with smart pointers usually have the form of SmartPtr(className). It is now justified to look into the header/source file of CDateTime class.
A peek into datetime.cpp showed that CDateTime::now() returns a ‘new’ instance of CDateTime.

CDateTime*
CDateTime::now()
{
    return new CDateTime;
}

This finding confirmed that the

CDateTime *dt = CDateTime::now();

statement was causing the leak.
During further code review, the following statement was also suspicious.

result = foo(CDateTime::now());

This is yet another classic, subtle cause of memory leak. During execution, CDateTime::now() is first executed, the return value is immediately passed to getQueTime for further execution. There is now an un-named instance of CDateTime object on the heap, and this instance of object will never get deleted because the mechanism never existed.
Examining the disassembled code.

004016C8 call @ILT+155(CDateTime::now) (004010a0)
004016CD push eax <== eax points to the return value of ::now, which is on the heap because of ‘new’ operator used. The push statement here is a standard calling convention to pass parameter to subsequent function
004016CE call @ILT+175(foo) (004010b4)

3. Fixing the Leak

There are at least two ways to solve the first form of leak. The use of heap object without deleting it, as caused by the following statement.

CDateTime *dt = CDateTime::now();

Some might say one way to address the leak is to delete the *dt pointer after it was used.

CDateTime *dt = CDateTime::now();
//do something here
...
//pass the pointer to another function
someResult = foo(dt);
...
//do some conditional block here
...
delete dt;

However, it is important to note that foo(dt) might throw an exception. Or the conditional block might exit the function before ‘delete’ statement is executed. All these possibilities that cause the program to exit prematurely can lead to memory leak[1].
The second attempt is to wrap heap object returned by CDateTime::now() with a smart pointer. The C++ framework has an implementation of smart pointer called SmartPtr.

CDateTime*dt = CDateTime::now();

becomes

SmartPtr(CDateTime) dt = CDateTime::now();

Using the SmartPtr wrapper will cause the heap object to be released when the automatic local variable dt goes out of scope. There was however a side effect as a result of taking this approach.

The function foo()does not take parameter of type SmartPtr(CDateTime). Passing dt to foo()caused the compiler to complain about undefinedfoo() function. [Compiler fails to look up foo(const SmartPtr(CDateTime)&) function, or implicit conversion is not available]. The problem become worst when foo() is an external function where source code is not available. Due to these reasons, the idea of wrapping theCDateTime::now() had to be shelved.

The final attempt was to avoid using static new() member of CDateTime class altogether. It was obvious that the default-constructed CDateTimecontains value of current date and time. Instead of using CDateTime::now(), all codes that called this function were given the same value by creating a local copy of default-constructed CDateTime class.

CDateTime *dt = CDateTime::now();

became

CDateTime dt;

For the second cause of memory leak

result = foo(CDateTime::now());

became

CDateTime dt;
result = foo(&dt);

After implementing the fix, the program was tested again for memory leak. No leak was detected and issue considered closed.

4. Conclusions

It is important for C++ developer to be diligent about objects created on the heap. Failure to delete an object created on the heap will certainly cause memory to leak and eventually lead to memory exhaustion. This however does not apply to developers developing under framework where all objects are deterministically or non-deterministically garbage collected. One example of this kind of framework is Microsoft’s .NET framework.

[1] Item 13, Effective C++ - 55 Ways to Improve Your Programs and Design, Scott Mayer