Python Pandas: memory issue when subsetting a DataFrame

Question

I am working with some big pandas DataFrame. I realised that the memory usage (as monitored in Win Task Manager ) didn't decrease when assigning a subset of one DataFrame to itself. For example, if there is a big DataFrame df which takes roughly 10GB memory, after doing operations like below:

df = df[df['v1']==1]

or even

df = df.loc[0:10]

The memory usage line in Task Manager wouldn't change at all.

I have searched a while and read some posts here and there - but couldn't find a understandable reason or solution. Any help are appreciated!

Is there a way to reduce the memory usage? I read some posts suggesting reading less data in the beginning, but this solution seems to be quite difficult in my case.

Answer 1

One solution that worked for me is deleting each column/row one by one inplace.

for x in range(0,10):
    df.drop(x, inplace=True, axis=0)

Python Pandas: memory issue when subsetting a DataFrame

Question

1 answers

solution1
0 2019-02-19 20:49:37

Python Pandas: memory issue when subsetting a DataFrame

Question

1 answers

solution1 0 2019-02-19 20:49:37

solution1
0 2019-02-19 20:49:37