简体   繁体   English

Python Pandas:子集DataFrame时出现内存问题

[英]Python Pandas: memory issue when subsetting a DataFrame

I am working with some big pandas DataFrame. 我正在使用一些pandas DataFrame。 I realised that the memory usage (as monitored in Win Task Manager ) didn't decrease when assigning a subset of one DataFrame to itself. 我意识到当分配一个DataFrame的子集给自己时,内存使用率(在Win Task Manager进行监视)并没有减少。 For example, if there is a big DataFrame df which takes roughly 10GB memory, after doing operations like below: 例如,执行以下操作后,如果有一个大的DataFrame df占用大约10GB内存:

df = df[df['v1']==1]

or even 甚至

df = df.loc[0:10]

The memory usage line in Task Manager wouldn't change at all. 任务管理器中的内存使用情况行根本不会改变。

I have searched a while and read some posts here and there - but couldn't find a understandable reason or solution. 我搜索了一阵子,在这里和那里读了一些帖子-但找不到可理解的原因或解决方案。 Any help are appreciated! 任何帮助表示赞赏!

Is there a way to reduce the memory usage? 有没有办法减少内存使用量? I read some posts suggesting reading less data in the beginning, but this solution seems to be quite difficult in my case. 我读过一些文章,建议一开始读取较少的数据,但是就我而言,这种解决方案似乎非常困难。

One solution that worked for me is deleting each column/row one by one inplace. 对我有用的一种解决方案是就地删除每一列/每一行。

for x in range(0,10):
    df.drop(x, inplace=True, axis=0)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM