简体   繁体   English

如何以内存有效的方式重命名熊猫数据框(不创建副本)?

[英]How to rename a pandas dataframe in a memory-efficient way (without creating a copy)?

I want to rename a pandas dataframe df_old into df_new . 我想将熊猫数据df_old df_new重命名为df_new

Since df.rename only seems to be designed for single series/columns within a given dataframe, I use the following approach in the moment: 由于df.rename似乎只为给定数据帧内的单个系列/列而设计,因此目前我使用以下方法:

df_new = df_old
del df_old

However, this is not memory efficient at all, since it creates a copy of df_old . 但是,由于创建了df_old的副本,因此这根本不提高内存效率。

How to rename a pandas dataframe in a more memory-efficient way, similar to inplace = True ? 如何以更节省内存的方式重命名pandas数据帧,类似于inplace = True

The right answer to the question: 这个问题的正确答案:

"How to rename a pandas dataframe in a more memory-efficient way, similar to inplace = True?" is: 是:

newName = oldName is already a memory-efficient way of renaming newName = oldName 已经是一种内存有效的重命名方法

Let's give a summary of what follows first: 让我们总结一下以下内容:

There is no significant change in memory requirement due to df_new = df_old 由于df_new = df_old ,因此内存需求没有明显变化

There is a nice ressource explaining it all HERE telling: 有一个很好的ressource解释这一切这里讲:

Python's memory management is so central to its behavior, not only do you not have to delete values, but there is no way to delete values. Python的内存管理对其行为如此重要,不仅您不必删除值,而且无法删除值。 You may have seen the del statement: 您可能已经看到了del语句:

nums = [1, 2, 3]
del nums

This does not delete the value nums, it deletes the name nums. 这不会删除数值num,而是删除名称nums。 The name is removed from its scope, and then the usual reference counting kicks in: if nums' value had only that one reference, then the value will be reclaimed. 该名称从其作用域中删除,然后进行常规引用计数:如果nums的值只有一个引用,则将回收该值。 But if it had other references, then it will not. 但是,如果它有其他引用,则不会。

All of the voluminous stuff below is just to provide another prove of what was stated above. 以下所有大量内容仅是为了提供上述证明。


See THIS code: 请参阅此代码:

from memory_profiler import profile

@profile(precision=4)
def my_func(): 
    import pandas

    df_old = pandas.DataFrame([1,2,3,4,5])
    print(df_old)
    print(id(df_old))
    df_new = df_old
    print(id(df_new), id(df_old))
    del df_old

my_func()

on my box it gives: 在我的盒子上可以看到:

>python3.6 -u "renamePandas_Cg.py"
   0
0  1
1  2
2  3
3  4
4  5
140482968978768
140482968978768 140482968978768
Filename: renamePandas_Cg.py

Line #    Mem usage    Increment   Line Contents
================================================
     3  31.1680 MiB   0.0000 MiB   @profile(precision=4)
     4                             def my_func(): 
     5  64.1250 MiB  32.9570 MiB       import pandas
     6                                 
     7  64.1953 MiB   0.0703 MiB       df_old = pandas.DataFrame([1,2,3,4,5])
     8  64.6680 MiB   0.4727 MiB       print(df_old)
     9  64.6680 MiB   0.0000 MiB       print(id(df_old))
    10  64.6680 MiB   0.0000 MiB       df_new = df_old
    11  64.6680 MiB   0.0000 MiB       print(id(df_new), id(df_old))
    12  64.6680 MiB   0.0000 MiB       del df_old

What proves, that what is said in the comments is actually a fact, because both df_old and df_new point to the same address in memory AND there is NO INCREASE in memory because of df_new = df_old . 事实证明,注释中所说的实际上是事实,因为df_old和df_new都指向内存中的相同地址,并且由于df_new = df_old在内存中没有增加

Let's see if shown no increase in memory is only because of too small precision. 让我们看看是否显示内存增加仅仅是因为精度太低。 Here the result for presision=7 : 在这里,presision的结果= 7:

>python3.6 -u "renamePandas_Cg.py"
   0
0  1
1  2
2  3
3  4
4  5
140698387071216
140698387071216 140698387071216
Filename: renamePandas_Cg.py

Line #    Mem usage    Increment   Line Contents
================================================
     3  31.1718750 MiB   0.0000000 MiB   @profile(precision=7)
     4                             def my_func(): 
     5  64.1992188 MiB  33.0273438 MiB       import pandas
     6                                 
     7  64.3125000 MiB   0.1132812 MiB       df_old = pandas.DataFrame([1,2,3,4,5])
     8  64.7226562 MiB   0.4101562 MiB       print(df_old)
     9  64.7226562 MiB   0.0000000 MiB       print(id(df_old))
    10  64.7226562 MiB   0.0000000 MiB       df_new = df_old
    11  64.7226562 MiB   0.0000000 MiB       print(id(df_new), id(df_old))
    12  64.7226562 MiB   0.0000000 MiB       del df_old

Hmmm ... The memory increase is not the same as before ... and inconsistent changing from one run to another. 嗯...内存增加与以前不一样...并且从一次运行到另一次运行不一致。

By the way if you still doubt the results because the dataframe is so small change df_old = pandas.DataFrame([1,2,3,4,5]) to df_old = pandas.DataFrame(100000*[1,2,3,4,5]) and you will see same results as before, except that the statement df_old = pandas.DataFrame(100000*[1,2,3,4,5]) consumes more than 7 MByte of memory space. 顺便说一句,如果您仍然怀疑结果,因为数据帧太小, df_old = pandas.DataFrame([1,2,3,4,5])更改为df_old = pandas.DataFrame(100000*[1,2,3,4,5]) ,您将看到与以前相同的结果,除了语句df_old = pandas.DataFrame(100000*[1,2,3,4,5])占用了7 MB以上的内存空间。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 将函数应用于 Pandas 数据帧:是否有更节省内存的方法? - Applying function to pandas dataframe: is there a more memory-efficient way of doing this? 使用内存高效的方式在Python中从字典创建迭代器 - creating an iterator in Python from a dictionary in memory-efficient way 对DataFrame行的内存有效过滤 - Memory-efficient filtering of `DataFrame` rows 如何在类中使用变量来提高内存效率? - How to be memory-efficient with variables in classes? pyspark 内存高效循环将指标列添加到 dataframe - pyspark memory-efficient loop to add indicator columns to dataframe 如何以内存高效的方式在python中拆分和解析大文本文件? - How to split and parse a big text file in python in a memory-efficient way? 以内存高效的方式将加权边列表转换为邻接矩阵 - Convert a list of weighted edges into an adjacency matrix in a memory-efficient way 在 python 中表示非均匀数据集的内存有效方式 - Memory-efficient way of representing non-uniform datasets in python 是否有一种以内存效率的方式来创建数百万个etree.ElementTree对象? - Is there a memory-efficient way to create millions of etree.ElementTree objects? 计算复杂 numpy ndarray 的 abs()**2 的最节省内存的方法 - Most memory-efficient way to compute abs()**2 of complex numpy ndarray
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM