Python Pandas：子集DataFrame时出现内存问题

Question

I am working with some big pandas DataFrame. 我正在使用一些pandas DataFrame。 I realised that the memory usage (as monitored in Win Task Manager ) didn't decrease when assigning a subset of one DataFrame to itself. 我意识到当分配一个DataFrame的子集给自己时，内存使用率（在Win Task Manager进行监视）并没有减少。 For example, if there is a big DataFrame df which takes roughly 10GB memory, after doing operations like below: 例如，执行以下操作后，如果有一个大的DataFrame df占用大约10GB内存：

df = df[df['v1']==1]

or even 甚至

df = df.loc[0:10]

The memory usage line in Task Manager wouldn't change at all. 任务管理器中的内存使用情况行根本不会改变。

I have searched a while and read some posts here and there - but couldn't find a understandable reason or solution. 我搜索了一阵子，在这里和那里读了一些帖子-但找不到可理解的原因或解决方案。 Any help are appreciated! 任何帮助表示赞赏！

Is there a way to reduce the memory usage? 有没有办法减少内存使用量？ I read some posts suggesting reading less data in the beginning, but this solution seems to be quite difficult in my case. 我读过一些文章，建议一开始读取较少的数据，但是就我而言，这种解决方案似乎非常困难。

Answer 1

One solution that worked for me is deleting each column/row one by one inplace. 对我有用的一种解决方案是就地删除每一列/每一行。

for x in range(0,10):
    df.drop(x, inplace=True, axis=0)

Python Pandas：子集DataFrame时出现内存问题

问题描述

1 个解决方案

解决方案1
0 2019-02-19 20:49:37

Python Pandas：子集DataFrame时出现内存问题

问题描述

1 个解决方案

解决方案1 0 2019-02-19 20:49:37

解决方案1
0 2019-02-19 20:49:37