简体   繁体   中英

Python Pandas: memory issue when subsetting a DataFrame

I am working with some big pandas DataFrame. I realised that the memory usage (as monitored in Win Task Manager ) didn't decrease when assigning a subset of one DataFrame to itself. For example, if there is a big DataFrame df which takes roughly 10GB memory, after doing operations like below:

df = df[df['v1']==1]

or even

df = df.loc[0:10]

The memory usage line in Task Manager wouldn't change at all.

I have searched a while and read some posts here and there - but couldn't find a understandable reason or solution. Any help are appreciated!

Is there a way to reduce the memory usage? I read some posts suggesting reading less data in the beginning, but this solution seems to be quite difficult in my case.

One solution that worked for me is deleting each column/row one by one inplace.

for x in range(0,10):
    df.drop(x, inplace=True, axis=0)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM