内核在itertools.combinations命令后死亡

Question

I am using Python 3.5.2 |Anaconda 4.3.0 (x86_64)| 我正在使用Python 3.5.2 | Anaconda 4.3.0（x86_64）| (default, Jul 2 2016, 17:52:12) [GCC 4.2.1 Compatible Apple LLVM 4.2 (clang-425.0.28)] （默认值，2016年7月2日，17：52：12）[GCC 4.2.1兼容的Apple LLVM 4.2（clang-425.0.28）]

I have to run the following command 我必须运行以下命令

longList = list(combinations(range(2134), 3))

I know that the length of this is around 1.6 billion. 我知道这笔费用的长度约为16亿美元。 When I run it, after some time I get the message "The kernel appears to have died. It will restart automatically." 当我运行它时，一段时间后，我收到消息“内核似乎已经死亡。它将自动重新启动。”

The same command with 3 instead of 2 runs without any issues: 使用3而不是2的相同命令运行没有任何问题：

longList = list(combinations(range(2134), 2))

What can / should I do in this case? 在这种情况下，我应该/应该怎么办？

Answer 1

You are likely running out of memory. 您可能内存不足。 Quick calculation: a 64-bit int or pointer is 8 bytes large. 快速计算：64位int或指针为8个字节大。 You have 1.6 billion combinations which are tuples. 您有16亿个组合，它们是元组。 Each tuple contains three integers. 每个元组包含三个整数。 This means you will need at least 1.6E9 * (1 + 3) * 8B = 48GB of memory. 这意味着您至少需要1.6E9 *（1 + 3）* 8B = 48GB内存。

However, due to Python's memory model you will need many times more than that: every integer is actually an object, so we need 1 machine word for the pointer in the list, and probably 3 or 4 machine words for the object itself (I'm not sure about the details, read the CPython source for actual object layout). 但是，由于Python的内存模型，您将需要的次数更多：每个整数实际上是一个对象，因此我们需要1个机器字作为列表中的指针，并可能需要3或4个机器字作为对象本身（我如果您不确定详细信息，请阅读CPython源代码了解实际的对象布局。 The tuple object will also have overhead. 元组对象也将有开销。 I'll assume every object has two words overhead. 我假设每个对象的开销为两个字。 So we have to add an extra 1.6E9 * (3 + 1) * 2 * 8B = 95GB additional overhead, to around 143GB in total. 因此，我们必须增加额外的1.6E9 *（3 +1）* 2 * 8B = 95GB的额外开销，总计约为143GB。

This can be avoided by using a dense numpy array because it uses real integers, not objects. 通过使用密集的numpy数组可以避免这种情况，因为它使用实数而不是对象。 This eliminates all the overhead from integer and tuple objects, so that we would “only” need 1.6E9 * 3 * 8B = 35GB. 这消除了整数和元组对象的所有开销，因此我们将“仅”需要1.6E9 * 3 * 8B = 35GB。

I assume you are not running hardware with that much memory. 我假设您没有运行具有那么多内存的硬件。

Your combinations(..., 2) call is not a problem because that only produces around 2 million tuples, which has memory requirements in the megabyte range (2.2E6 * (1 + 4 + 2*3) * 8B = 180MB). 您的combinations(..., 2)调用不是问题，因为它仅产生大约200万个元组，其内存需求在兆字节范围内（2.2E6 *（1 + 4 + 2 * 3）* 8B = 180MB）。 As a numpy array we only need 2.2E6 * 2 * 8B = 33MB. 作为一个numpy数组，我们只需要2.2E6 * 2 * 8B = 33MB。

So what's the solution here? 那么这里的解决方案是什么？

At scale, low-level details like memory models are very relevant even for Python 从规模上讲，诸如内存模型之类的底层细节对于Python也非常重要。
Using numpy can drastically reduce memory usage, typically by a factor of 4. More if you use smaller types, eg dtype='int16' would be an additional factor of 4 reduction. 使用numpy可以大幅度减少内存使用量，通常减少4倍。如果使用较小的类型，则更多（例如dtype='int16'将减少4倍。
Think hard whether you need to eagerly transform the combinations() into a list, or whether you can consume the iterator lazily or in smaller chunks 认真考虑是否需要急切地将combinations()转换为列表，还是可以懒散地或以较小的块使用迭代器

内核在itertools.combinations命令后死亡

问题描述

1 个解决方案

解决方案1
1 2018-07-04 10:03:47

内核在itertools.combinations命令后死亡

问题描述

1 个解决方案

解决方案1 1 2018-07-04 10:03:47

解决方案1
1 2018-07-04 10:03:47