简体   繁体   中英

Control memory usage of multi-threaded python process

I need to control the memory usage of current python process. This process is a multi-threaded python RPC server.

These threads do memory intensive work ( Threads are calling a memory intensive c library using ctypes. So these python threads are true parallel. ).

I am planning to control the memory usage of this process by delaying calls to memory intensive functions, if threads see the current memory usage is above threshold.

This application runs on freeBSD 9.2.

I need help in

1) How to get the memory size of current process? Since, I will be doing this operation more frequently, I want this call to be light-weight.

2) Is this idea of controlling memory usage is sound?

If you are OK with installing external libraries, you could use psutil to monitor system performance. It's pretty lightweight and measures memory accurately, so you should be able to call it without causing any serious performance issues. If you want super in-depth memory information, there's a related library called memory_profiler . It can be found here, https://pypi.python.org/pypi/memory_profiler , and psutil can be found here: https://github.com/giampaolo/psutil

Yes, you should be able to manipulate memory in Python. In Python, memory management can be done with the methods in the memory interface. You can release and reallocate blocks of memory as you like. If you want more in-depth info on how Python deals with memory internally and how you can manipulate that, I reccomend you check out Releasing memory in Python .

Given that most programs use shared libraries, the question is which memory size ? Let's assume for the moment that you're talking about the resident set size ?

There are at least two ways of finding this out;

  • ps -u
  • procstat -r <PID>

Examples of both:

> ps -u
USER     PID %CPU %MEM    VSZ  RSS TT  STAT STARTED    TIME COMMAND
rsmith   820  0.0  0.0  10088  520 v0  I    Fri11PM 0:00.01 -tcsh (tcsh)
rsmith   823  0.0  0.0   9732   32 v0  I+   Fri11PM 0:00.00 /bin/sh /usr/local/bin/startx
rsmith   836  0.0  0.0  26152 1480 v0  I+   Fri11PM 0:00.00 xinit /home/rsmith/.xinitrc -- /usr/local/bi
rsmith   840  0.0  0.1 135996 4904 v0  S    Fri11PM 0:23.16 i3
rsmith   878  0.0  0.0  10088 1980  0  Ss   Fri11PM 0:00.74 -tcsh (tcsh)
rsmith  5091  0.0  0.0   9388 1168  0  R+    1:04AM 0:00.00 ps -u
rsmith 74939  0.0  0.1  10088 2268  1  Is+  Mon02PM 0:00.44 -tcsh (tcsh)

and

> procstat -r 820
  PID COMM             RESOURCE                          VALUE        
  820 tcsh             user time                    00:00:00.000000   
  820 tcsh             system time                  00:00:00.011079   
  820 tcsh             maximum RSS                             1840 KB
  820 tcsh             integral shared memory                  2728 KB
  820 tcsh             integral unshared data                   800 KB
  820 tcsh             integral unshared stack                  256 KB
  820 tcsh             page reclaims                            402   
  820 tcsh             page faults                               25   
  820 tcsh             swaps                                      0   
  820 tcsh             block reads                               58   
  820 tcsh             block writes                               0   
  820 tcsh             messages sent                              0   
  820 tcsh             messages received                          0   
  820 tcsh             signals received                           1   
  820 tcsh             voluntary context switches               105   
  820 tcsh             involuntary context switches               0

In this case procstat gives you the most information.

The procstat program uses libprocstat to get the process information. So you basically have two choices;

  1. Use subprocess.check_output to call procstat , and parse its output.
  2. Use ctypes to get the info with libprocstat .

The first option is probably the easiest, the second may be more leightweight.

With regard to controlling memory, there are two aspects;

  1. memory usage in Python
  2. memory usage in libraries/extensions.

You can try to minimize Python memory usage by actively del -ing objects that you don't need anymore. In ctypes you might have to allocate buffers (eg with ctypes.create_string_buffer ) or arrays for functions to store their data in. These are Python objects and can be handled as such. You can call gc.collect() to force garbage collection.

But sometimes libraries return pointers to data structures that they have allocated. Those must typically be freed by either free() from the C library or special functions that the library provides.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM