简体   繁体   中英

Kernel dies after itertools.combinations command

I am using Python 3.5.2 |Anaconda 4.3.0 (x86_64)| (default, Jul 2 2016, 17:52:12) [GCC 4.2.1 Compatible Apple LLVM 4.2 (clang-425.0.28)]

I have to run the following command

longList = list(combinations(range(2134), 3))

I know that the length of this is around 1.6 billion. When I run it, after some time I get the message "The kernel appears to have died. It will restart automatically."

The same command with 3 instead of 2 runs without any issues:

longList = list(combinations(range(2134), 2))

What can / should I do in this case?

You are likely running out of memory. Quick calculation: a 64-bit int or pointer is 8 bytes large. You have 1.6 billion combinations which are tuples. Each tuple contains three integers. This means you will need at least 1.6E9 * (1 + 3) * 8B = 48GB of memory.

However, due to Python's memory model you will need many times more than that: every integer is actually an object, so we need 1 machine word for the pointer in the list, and probably 3 or 4 machine words for the object itself (I'm not sure about the details, read the CPython source for actual object layout). The tuple object will also have overhead. I'll assume every object has two words overhead. So we have to add an extra 1.6E9 * (3 + 1) * 2 * 8B = 95GB additional overhead, to around 143GB in total.

This can be avoided by using a dense numpy array because it uses real integers, not objects. This eliminates all the overhead from integer and tuple objects, so that we would “only” need 1.6E9 * 3 * 8B = 35GB.

I assume you are not running hardware with that much memory.

Your combinations(..., 2) call is not a problem because that only produces around 2 million tuples, which has memory requirements in the megabyte range (2.2E6 * (1 + 4 + 2*3) * 8B = 180MB). As a numpy array we only need 2.2E6 * 2 * 8B = 33MB.

So what's the solution here?

  • At scale, low-level details like memory models are very relevant even for Python
  • Using numpy can drastically reduce memory usage, typically by a factor of 4. More if you use smaller types, eg dtype='int16' would be an additional factor of 4 reduction.
  • Think hard whether you need to eagerly transform the combinations() into a list, or whether you can consume the iterator lazily or in smaller chunks

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM