简体   繁体   中英

Pandas manipulating a DataFrame inplace vs not inplace (inplace=True vs False)

I'm wondering if there's a significant reduction in memory usage when we choose to manipulate a dataframe in-place (compared to not in-place ).

I've done a bit of searching on Stack Overflow and came across this post where the answer states that if an operation is not done in-place, a copy of the dataframe is returned (I guess that's a bit obvious when there's an optional parameter called 'inplace' :P).

If I don't need to keep the original dataframe around, it would be beneficial (and logical) to just modify the dataframe in place right?

Context:

I'm trying to get the top element when sorted by a particular 'column' in the dataframe. I was wondering which of these two is more efficient:

in-place:

df.sort('some_column', ascending=0, inplace=1)
top = df.iloc[0]

vs

copy:

top = df.sort('some_column', ascending=0).iloc[0]

For the 'copy' case, it still allocates memory in making the copy when sorting even though I'm not assigning the copy to a variable right? If so, how long does it take to deallocate that copy from memory?

Thanks for any insights in advance!

In general, there is no difference between inplace=True and returning an explicit copy - in both cases, a copy is created. It just so happens that, in the first case, the data in the copy is copied back into the original df object, so reassignment is not necessary.

Furthermore, note that as of v0.21 , df.sort is deprecated, use sort_values instead.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM