简体   繁体   中英

Java process getting killed likely due to Linux OOM killer

My java process is getting killed after sometime. The heap settings are min - 2 Gb and max 3 Gb with parallel GC. From pmap command, it is showing more than 40 64Mb anonymos blocks which seems to be causing linux OOM killer.

Error :

There is insufficient memory for the Java Runtime Environment to continue. Native memory allocation (mmap) failed to map 71827456 bytes for committing reserved memory. Possible reasons: The system is out of physical RAM or swap space In 32 bit mode, the process size limit was hit Possible solutions: Reduce memory load on the system Increase physical memory or swap space Check if swap backing store is full Use 64 bit Java on a 64 bit OS Decrease Java heap size (-Xmx/-Xms) Decrease number of Java threads Decrease Java thread stack sizes (-Xss) Set larger code cache with -XX:ReservedCodeCacheSize= This output file may be truncated or incomplete.

Out of Memory Error (os_linux.cpp:2673), pid=21171, tid=140547280430848

JRE version: Java(TM) SE Runtime Environment (8.0_51-b16) (build 1.8.0_51-b16) Java VM: Java HotSpot(TM) 64-Bit Server VM (25.51-b03 mixed mode linux-amd64 compressed oops) Failed to write core dump. Core dumps have been disabled. To enable core dumping, try “ulimit -c unlimited” before starting Java again

Tried reducing the heap to min 512 Mb and max 2 Gb along with G1GC, we see limited number of 64 Mb blocks around 18 and the process does not get killed.

But with heap min 2Gb and max 3Gb, and G1GC , we see high number of 64 Mb blocks.

As per documentation, the max number of 64Mb blocks (malloc arenas) for a 64 bit system with 2 cores can be 2*8 = 16 but we see more than 16.

That doesn't look like the Linux OOM killer.

The symptoms you describe indicate that you have run out of physical memory and swap space. In fact, the error message says exactly that:

There is insufficient memory for the Java Runtime Environment to continue. Native memory allocation (mmap) failed to map 71827456 bytes for committing reserved memory. Possible reasons:

  • The system is out of physical RAM or swap space

  • In 32 bit mode, the process size limit was hit

A virtual memory system works by mapping the virtual address space to a combination of physical RAM pages, and disk pages. At any given time, the live page may live in RAM or on disk. If an application asks for more virtual memory (eg using an mmap call), the OS may have to say "can't". That is what has happened.

The solutions are as the message says:

  • get more RAM,
  • increase the size of swap space, or
  • limit the amount of memory that the application asks for ... in various ways.

The G1GC parameters (apart from the max heap size) are largely irrelevant. My understanding is that the max heap size is the total amount of (virtual) memory that the Java heap is allowed to occupy.


So if this is not the Linux OOM killer, what is that?

In fact the OOM killer is a mechanism that identifies applications that are causing dangerous performance problems by doing too much paging. As I mentioned at the start, virtual memory consists of pages that either live in RAM or on disk. In general, the application doesn't know whether any VM page is RAM resident or not. The operating system just takes care of it.

If the application tries to use (read from or a write to) a page that is not RAM resident, a "page fault" occurs. The OS handles this by:

  • suspending the application thread
  • finding a spare RAM page
  • loading reading the disk page into the RAM page
  • resuming the application thread ... which can then access the memory at the address.

In addition, the operating system needs to maintain a pool of "clean" pages; ie pages where the RAM and disk versions are the same. This is done by scanning for mages that have been modified by the application and writing them to disk.

If an application is behaving "nicely", then the amount of paging activity is relatively modest, and thread don't get suspend often. But if there is a lot of paging, you can get to the point where the paging I/O is a bottleneck. In the worst case, the whole system will lock up.

The OOM killer's purpose is to identify processes that are causing the dangerously high paging rates, and .... kill them.

If a JVM process is killed by the OOM killer, it it doesn't get a chance to print an error message (like you got). The process gets a "SIGKILL": instant death.

But ... if you look in the system logfiles, you should see a message that says that such and such process has been killed by the OOM killer.

There are lots of resources that explain the OOM killer:

This answer tries to deal with your observations about memory blocks, the MALLOC_ARENA_MAX and so on. I'm not an expert on native memory allocators. This is based on the Malloc Internals page in the Glibc Wiki.

You have read PrestoDB issue 8993 as implying that glibc malloc will allocate at most MALLOC_ARENA_MAX x NOS_THREADS blocks of memory for the native heap. According to "Malloc Internals", this is not necessarily true.

  1. If the application requests a large enough node, the implementation will call mmap directly rather than using an arena. (The threshold is given by the M_MMAP_THRESHOLD option.)

  2. If an existing arena fills up and compaction fails, the implementation will attempt to grow the arena by calling sbrk or mmap .

These factors mean that MALLOC_ARENA_MAX does not limit the number of mmap'd blocks.


Note that the purpose of arenas is to reduce contention when there are lots of threads calling malloc and free . But it comes with the risk that more memory will be lost due to fragmentation. The goal of MALLOC_ARENA_MAX tuning is to reduce memory fragmentation.

So far, you haven't shown us any clear evidence that you memory problems are due to fragmentation. Other possible explanations are:

  • your application has a native memory leak, or
  • your application is simply using a lot of native memory.

Either way, it looks like MALLOC_ARENA_MAX tuning has not helped.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM