简体   繁体   English

查询pandas copy()方法

[英]Query about pandas copy() method

df1 = pd.DataFrame({'A':['aaa','bbb','ccc'], 'B':[1,2,3]})
df2=df1.copy()
df1.loc[0,'A']='111' #modifying the 1st element of column A
print df1
print df2

When modifying df1 the object sf2 is not modified.修改df1时,不会修改 object sf2 I expected it because I used copy()我期待它,因为我使用了copy()

s1=pd.Series([[1,2],[3,4]])
s2=s1.copy()
s1[0][0]=0 #modifying the 1st element of list [1,2]
print s1
print s2

But why did s2 changed as well in this case?但是为什么在这种情况下s2也发生了变化? I expected no change of s2 because I used copy() to create it, but for my surprise, when modifying s1 the object s2 is also modified.我预计s2不会发生变化,因为我使用了copy()创建它,但令我惊讶的是,在修改s1时,object s2也被修改了。 I don't get why.我不明白为什么。

This is occurring because your pd.Series is of dtype=object, so it essentially copied a bunch of references to python objects.发生这种情况是因为您的pd.Series是 dtype=object,因此它实际上复制了一堆对 python 对象的引用。 Observe:观察:

In [1]: import pandas as pd

In [2]: s1=pd.Series([[1,2],[3,4]])
   ...:

In [3]: s1
Out[3]:
0    [1, 2]
1    [3, 4]
dtype: object

In [4]: s1.dtype
Out[4]: dtype('O')

Since list objects are mutable, then the operation:由于list对象是可变的,那么操作:

s1[0][0]=0

Modifies the list in-place .就地修改列表。

This behavior is a "shallow copy", which normally isn't an issue with pandas data structures, because normally you would be using a numeric data type in which case shallow copies don't apply, or if you do use the object dtype you would be using python string objects, which are immutable.这种行为是“浅拷贝”,这通常不是pandas数据结构的问题,因为通常你会使用数字数据类型,在这种情况下浅拷贝不适用,或者如果你确实使用 object dtype you将使用不可变的 python 字符串对象。

Note, pandas containers have a different notion of a deep-copy.注意, pandas容器有不同的深拷贝概念。 Notice the .copy method has a default deep=True , but from the documentation:注意.copy方法有一个默认的deep=True ,但是来自文档:

When deep=True (default), a new object will be created with a copy of the calling object's data and indices.deep=True (默认)时,将使用调用对象的数据和索引的副本创建一个新的 object。 Modifications to the data or indices of the copy will not be reflected in the original object (see notes below).对副本的数据或索引的修改将不会反映在原始 object 中(参见下面的注释)。

When deep=False , a new object will be created without copying the calling object's data or index (only references to the data and index are copied).deep=False时,将创建一个新的 object 而不复制调用对象的数据或索引(仅复制对数据和索引的引用)。 Any changes to the data of the original will be reflected in the shallow copy (and vice versa).对原始数据的任何更改都将反映在浅拷贝中(反之亦然)。 ... When deep=True , data is copied but actual Python objects will not be copied recursively, only the reference to the object. ...当deep=True时,复制数据,但不会递归复制实际的 Python 对象,仅复制对 object 的引用。 This is in contrast to copy.deepcopy in the Standard Library, which recursively copies object data (see examples below).这与标准库中的copy.deepcopy不同,后者递归地复制 object 数据(参见下面的示例)。

Again, this is because pandas is designed for using numeric dtypes, with some built-in support for str objects.同样,这是因为pandas是为使用数字 dtypes 而设计的,并具有对str对象的一些内置支持。 A pd.Series of list objects is very strange indeed, and really not a good use-case for a pd.Series . list对象的pd.Series确实很奇怪,对于pd.Series确实不是一个好的用例。

When you copied the s1 object, it actually created a new, separate Series object and referenced it to s2 - just as you expected.当您复制s1 object 时,它实际上创建了一个新的独立系列 object 并将其引用到s2 - 正如您所期望的那样。 However, the two list within the s1 Series object were not duplicated with the Series.但是, s1系列 object的两个列表并未与该系列重复。 It simply copied their references.它只是复制了他们的参考资料。

See here for a good starting point towards understanding the difference between a Python reference and an object .请参阅此处了解了解 Python referenceobject之间区别的良好起点。

Simply put, a Python variable is not the same thing as the actual Python object.简单地说,一个 Python variable与实际的 Python object 不是一回事。 Variables (like s1 and s2 ) are simply references that point to the memory location where the actual object lives.变量(如s1s2 )只是指向实际 object 所在的 memory 位置的简单引用。

Because the original Series object s1 contained two list references, versus two list objects, only the references for the internal list objects were copied (not the list objects themselves).因为原始系列 object s1包含两个列表引用,而不是两个列表对象,所以只复制了内部列表对象的references (而不是列表对象本身)。

import pandas as pd

s1=pd.Series([[1,2],[3,4]])
# The oject referenced by variable "s1" has a memory address
print ("s1:", hex(id(s1)))
s2=s1.copy()
# The oject referenced by variable "s2" has a different memory address
print ("s2:", hex(id(s2)))
# However when you copied "s1", the 
# list items within only had their references copied
# So "s1[0]" and "s2[0]" are simply references to the same object
print ("s1[0]:", hex(id(s1[0])))
print ("s2[0]:", hex(id(s2[0])))

OUTPUT: OUTPUT:

s1: 0x7fcdf5678898 # A different address form s2
s2: 0x7fcddee25240 # A different address form s1
s1[0]: 0x7fcdddf9f6c8 # The same address for the first list
s2[0]: 0x7fcdddf9f6c8 # The same address for the first list

@juanpa.arrivillaga is correct in her answer that you need to use a deep copy @juanpa.arrivillaga在她的回答中是正确的,您需要使用深层副本

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM