[英]Python Pandas: memory issue when subsetting a DataFrame
I am working with some big pandas
DataFrame. 我正在使用一些
pandas
DataFrame。 I realised that the memory usage (as monitored in Win Task Manager
) didn't decrease when assigning a subset of one DataFrame to itself. 我意识到当分配一个DataFrame的子集给自己时,内存使用率(在
Win Task Manager
进行监视)并没有减少。 For example, if there is a big DataFrame df which takes roughly 10GB
memory, after doing operations like below: 例如,执行以下操作后,如果有一个大的DataFrame df占用大约
10GB
内存:
df = df[df['v1']==1]
or even 甚至
df = df.loc[0:10]
The memory usage line in Task Manager wouldn't change at all. 任务管理器中的内存使用情况行根本不会改变。
I have searched a while and read some posts here and there - but couldn't find a understandable reason or solution. 我搜索了一阵子,在这里和那里读了一些帖子-但找不到可理解的原因或解决方案。 Any help are appreciated!
任何帮助表示赞赏!
Is there a way to reduce the memory usage? 有没有办法减少内存使用量? I read some posts suggesting reading less data in the beginning, but this solution seems to be quite difficult in my case.
我读过一些文章,建议一开始读取较少的数据,但是就我而言,这种解决方案似乎非常困难。
One solution that worked for me is deleting each column/row one by one inplace. 对我有用的一种解决方案是就地删除每一列/每一行。
for x in range(0,10):
df.drop(x, inplace=True, axis=0)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.