简体   繁体   中英

How to rename a pandas dataframe in a memory-efficient way (without creating a copy)?

I want to rename a pandas dataframe df_old into df_new .

Since df.rename only seems to be designed for single series/columns within a given dataframe, I use the following approach in the moment:

df_new = df_old
del df_old

However, this is not memory efficient at all, since it creates a copy of df_old .

How to rename a pandas dataframe in a more memory-efficient way, similar to inplace = True ?

The right answer to the question:

"How to rename a pandas dataframe in a more memory-efficient way, similar to inplace = True?" is:

newName = oldName is already a memory-efficient way of renaming

Let's give a summary of what follows first:

There is no significant change in memory requirement due to df_new = df_old

There is a nice ressource explaining it all HERE telling:

Python's memory management is so central to its behavior, not only do you not have to delete values, but there is no way to delete values. You may have seen the del statement:

nums = [1, 2, 3]
del nums

This does not delete the value nums, it deletes the name nums. The name is removed from its scope, and then the usual reference counting kicks in: if nums' value had only that one reference, then the value will be reclaimed. But if it had other references, then it will not.

All of the voluminous stuff below is just to provide another prove of what was stated above.


See THIS code:

from memory_profiler import profile

@profile(precision=4)
def my_func(): 
    import pandas

    df_old = pandas.DataFrame([1,2,3,4,5])
    print(df_old)
    print(id(df_old))
    df_new = df_old
    print(id(df_new), id(df_old))
    del df_old

my_func()

on my box it gives:

>python3.6 -u "renamePandas_Cg.py"
   0
0  1
1  2
2  3
3  4
4  5
140482968978768
140482968978768 140482968978768
Filename: renamePandas_Cg.py

Line #    Mem usage    Increment   Line Contents
================================================
     3  31.1680 MiB   0.0000 MiB   @profile(precision=4)
     4                             def my_func(): 
     5  64.1250 MiB  32.9570 MiB       import pandas
     6                                 
     7  64.1953 MiB   0.0703 MiB       df_old = pandas.DataFrame([1,2,3,4,5])
     8  64.6680 MiB   0.4727 MiB       print(df_old)
     9  64.6680 MiB   0.0000 MiB       print(id(df_old))
    10  64.6680 MiB   0.0000 MiB       df_new = df_old
    11  64.6680 MiB   0.0000 MiB       print(id(df_new), id(df_old))
    12  64.6680 MiB   0.0000 MiB       del df_old

What proves, that what is said in the comments is actually a fact, because both df_old and df_new point to the same address in memory AND there is NO INCREASE in memory because of df_new = df_old .

Let's see if shown no increase in memory is only because of too small precision. Here the result for presision=7 :

>python3.6 -u "renamePandas_Cg.py"
   0
0  1
1  2
2  3
3  4
4  5
140698387071216
140698387071216 140698387071216
Filename: renamePandas_Cg.py

Line #    Mem usage    Increment   Line Contents
================================================
     3  31.1718750 MiB   0.0000000 MiB   @profile(precision=7)
     4                             def my_func(): 
     5  64.1992188 MiB  33.0273438 MiB       import pandas
     6                                 
     7  64.3125000 MiB   0.1132812 MiB       df_old = pandas.DataFrame([1,2,3,4,5])
     8  64.7226562 MiB   0.4101562 MiB       print(df_old)
     9  64.7226562 MiB   0.0000000 MiB       print(id(df_old))
    10  64.7226562 MiB   0.0000000 MiB       df_new = df_old
    11  64.7226562 MiB   0.0000000 MiB       print(id(df_new), id(df_old))
    12  64.7226562 MiB   0.0000000 MiB       del df_old

Hmmm ... The memory increase is not the same as before ... and inconsistent changing from one run to another.

By the way if you still doubt the results because the dataframe is so small change df_old = pandas.DataFrame([1,2,3,4,5]) to df_old = pandas.DataFrame(100000*[1,2,3,4,5]) and you will see same results as before, except that the statement df_old = pandas.DataFrame(100000*[1,2,3,4,5]) consumes more than 7 MByte of memory space.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM