删除dataframe列时熊猫内存泄漏？

Question

I have some code like the following 我有一些类似以下的代码

df = ..... # load a very large dataframe
good_columns = set(['a','b',........]) # set of "good" columns we want to keep
columns = list(df.columns.values)
for col in columns:
   if col not in good_columns:
      df = df.drop(col, 1)

The odd thing is that it successfully drops the first column that is not good - so it isn't an issue where I am holding the old and new dataframe in memory at the same time and running out of space. 奇怪的是，它成功删除了不好的第一列-因此，我将旧数据帧和新数据帧同时保存在内存中并且空间不足时，这不是问题。 It breaks on the second column being dropped (MemoryError). 它在被删除的第二列（MemoryError）上中断。 This makes me suspect there is some kind of memory leak. 这使我怀疑存在某种内存泄漏。 How would I prevent this error from happening? 如何防止发生此错误？

Answer 1

It may be that your constantly returning a new and very large dataframe. 可能是您不断返回一个新的非常大的数据帧。 Try setting the drop inplace parameter to True. 尝试将drop inplace参数设置为True。

Answer 2

Make use of usecols argument while reading the large data frame to keep the columns you want instead of dropping them later on. 读取大数据框时，请使用usecols参数，以保留所需的列，而不是稍后将其删除。 Check here : http://pandas.pydata.org/pandas-docs/dev/generated/pandas.io.parsers.read_csv.html 在这里检查： http : //pandas.pydata.org/pandas-docs/dev/generated/pandas.io.parsers.read_csv.html

Answer 3

I tried the inplace=True argument but still had the same issues. 我尝试了inplace=True参数，但仍然遇到相同的问题。 Here's another solution dealing with the memory leak due to your architecture. 这是处理由于您的体系结构导致的内存泄漏的另一种解决方案。 That helped me when I had this same issue 当我遇到同样的问题时，这对我有帮助

删除dataframe列时熊猫内存泄漏？

问题描述

3 个解决方案

解决方案1
1 已采纳 2015-03-07 00:40:09

解决方案2
1 2015-03-07 04:55:04

解决方案3
0 2017-12-02 05:17:19

删除dataframe列时熊猫内存泄漏？

问题描述

3 个解决方案

解决方案1 1 已采纳 2015-03-07 00:40:09

解决方案2 1 2015-03-07 04:55:04

解决方案3 0 2017-12-02 05:17:19

解决方案1
1 已采纳 2015-03-07 00:40:09

解决方案2
1 2015-03-07 04:55:04

解决方案3
0 2017-12-02 05:17:19