简体   繁体   English

删除dataframe列时熊猫内存泄漏?

[英]Memory leak in pandas when dropping dataframe column?

I have some code like the following 我有一些类似以下的代码

df = ..... # load a very large dataframe
good_columns = set(['a','b',........]) # set of "good" columns we want to keep
columns = list(df.columns.values)
for col in columns:
   if col not in good_columns:
      df = df.drop(col, 1)

The odd thing is that it successfully drops the first column that is not good - so it isn't an issue where I am holding the old and new dataframe in memory at the same time and running out of space. 奇怪的是,它成功删除了不好的第一列-因此,我将旧数据帧和新数据帧同时保存在内存中并且空间不足时,这不是问题。 It breaks on the second column being dropped (MemoryError). 它在被删除的第二列(MemoryError)上中断。 This makes me suspect there is some kind of memory leak. 这使我怀疑存在某种内存泄漏。 How would I prevent this error from happening? 如何防止发生此错误?

It may be that your constantly returning a new and very large dataframe. 可能是您不断返回一个新的非常大的数据帧。 Try setting the drop inplace parameter to True. 尝试将drop inplace参数设置为True。

Make use of usecols argument while reading the large data frame to keep the columns you want instead of dropping them later on. 读取大数据框时,请使用usecols参数,以保留所需的列,而不是稍后将其删除。 Check here : http://pandas.pydata.org/pandas-docs/dev/generated/pandas.io.parsers.read_csv.html 在这里检查: http : //pandas.pydata.org/pandas-docs/dev/generated/pandas.io.parsers.read_csv.html

I tried the inplace=True argument but still had the same issues. 我尝试了inplace=True参数,但仍然遇到相同的问题。 Here's another solution dealing with the memory leak due to your architecture. 这是处理由于您的体系结构导致的内存泄漏的另一种解决方案 That helped me when I had this same issue 当我遇到同样的问题时,这对我有帮助

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM