Pandas manipulating a DataFrame inplace vs not inplace (inplace=True vs False)

Question

I'm wondering if there's a significant reduction in memory usage when we choose to manipulate a dataframe in-place (compared to not in-place ).

I've done a bit of searching on Stack Overflow and came across this post where the answer states that if an operation is not done in-place, a copy of the dataframe is returned (I guess that's a bit obvious when there's an optional parameter called 'inplace' :P).

If I don't need to keep the original dataframe around, it would be beneficial (and logical) to just modify the dataframe in place right?

Context:

I'm trying to get the top element when sorted by a particular 'column' in the dataframe. I was wondering which of these two is more efficient:

in-place:

df.sort('some_column', ascending=0, inplace=1)
top = df.iloc[0]

vs

copy:

top = df.sort('some_column', ascending=0).iloc[0]

For the 'copy' case, it still allocates memory in making the copy when sorting even though I'm not assigning the copy to a variable right? If so, how long does it take to deallocate that copy from memory?

Thanks for any insights in advance!

Answer 1

In general, there is no difference between inplace=True and returning an explicit copy - in both cases, a copy is created. It just so happens that, in the first case, the data in the copy is copied back into the original df object, so reassignment is not necessary.

Furthermore, note that as of v0.21 , df.sort is deprecated, use sort_values instead.

Pandas manipulating a DataFrame inplace vs not inplace (inplace=True vs False)

Question

1 answers

solution1
1 ACCPTED 2017-11-12 20:40:49

Pandas manipulating a DataFrame inplace vs not inplace (inplace=True vs False)

Question

1 answers

solution1 1 ACCPTED 2017-11-12 20:40:49

solution1
1 ACCPTED 2017-11-12 20:40:49