[英]Query about pandas copy() method
df1 = pd.DataFrame({'A':['aaa','bbb','ccc'], 'B':[1,2,3]})
df2=df1.copy()
df1.loc[0,'A']='111' #modifying the 1st element of column A
print df1
print df2
When modifying df1
the object sf2
is not modified.修改
df1
时,不会修改 object sf2
。 I expected it because I used copy()
我期待它,因为我使用了
copy()
s1=pd.Series([[1,2],[3,4]])
s2=s1.copy()
s1[0][0]=0 #modifying the 1st element of list [1,2]
print s1
print s2
But why did s2
changed as well in this case?但是为什么在这种情况下
s2
也发生了变化? I expected no change of s2
because I used copy()
to create it, but for my surprise, when modifying s1
the object s2
is also modified.我预计
s2
不会发生变化,因为我使用了copy()
创建它,但令我惊讶的是,在修改s1
时,object s2
也被修改了。 I don't get why.我不明白为什么。
This is occurring because your pd.Series
is of dtype=object, so it essentially copied a bunch of references to python objects.发生这种情况是因为您的
pd.Series
是 dtype=object,因此它实际上复制了一堆对 python 对象的引用。 Observe:观察:
In [1]: import pandas as pd
In [2]: s1=pd.Series([[1,2],[3,4]])
...:
In [3]: s1
Out[3]:
0 [1, 2]
1 [3, 4]
dtype: object
In [4]: s1.dtype
Out[4]: dtype('O')
Since list
objects are mutable, then the operation:由于
list
对象是可变的,那么操作:
s1[0][0]=0
Modifies the list in-place .就地修改列表。
This behavior is a "shallow copy", which normally isn't an issue with pandas
data structures, because normally you would be using a numeric data type in which case shallow copies don't apply, or if you do use the object dtype you would be using python string objects, which are immutable.这种行为是“浅拷贝”,这通常不是
pandas
数据结构的问题,因为通常你会使用数字数据类型,在这种情况下浅拷贝不适用,或者如果你确实使用 object dtype you将使用不可变的 python 字符串对象。
Note, pandas
containers have a different notion of a deep-copy.注意,
pandas
容器有不同的深拷贝概念。 Notice the .copy
method has a default deep=True
, but from the documentation:注意
.copy
方法有一个默认的deep=True
,但是来自文档:
When
deep=True
(default), a new object will be created with a copy of the calling object's data and indices.当
deep=True
(默认)时,将使用调用对象的数据和索引的副本创建一个新的 object。 Modifications to the data or indices of the copy will not be reflected in the original object (see notes below).对副本的数据或索引的修改将不会反映在原始 object 中(参见下面的注释)。
When
deep=False
, a new object will be created without copying the calling object's data or index (only references to the data and index are copied).当
deep=False
时,将创建一个新的 object 而不复制调用对象的数据或索引(仅复制对数据和索引的引用)。 Any changes to the data of the original will be reflected in the shallow copy (and vice versa).对原始数据的任何更改都将反映在浅拷贝中(反之亦然)。 ... When
deep=True
, data is copied but actual Python objects will not be copied recursively, only the reference to the object....当
deep=True
时,复制数据,但不会递归复制实际的 Python 对象,仅复制对 object 的引用。 This is in contrast tocopy.deepcopy
in the Standard Library, which recursively copies object data (see examples below).这与标准库中的
copy.deepcopy
不同,后者递归地复制 object 数据(参见下面的示例)。
Again, this is because pandas
is designed for using numeric dtypes, with some built-in support for str
objects.同样,这是因为
pandas
是为使用数字 dtypes 而设计的,并具有对str
对象的一些内置支持。 A pd.Series
of list
objects is very strange indeed, and really not a good use-case for a pd.Series
. list
对象的pd.Series
确实很奇怪,对于pd.Series
确实不是一个好的用例。
When you copied the s1
object, it actually created a new, separate Series object and referenced it to s2
- just as you expected.当您复制
s1
object 时,它实际上创建了一个新的独立系列 object 并将其引用到s2
- 正如您所期望的那样。 However, the two list within the s1
Series object were not duplicated with the Series.但是,
s1
系列 object中的两个列表并未与该系列重复。 It simply copied their references.它只是复制了他们的参考资料。
See here for a good starting point towards understanding the difference between a Python reference
and an object
.请参阅此处了解了解 Python
reference
和object
之间区别的良好起点。
Simply put, a Python variable
is not the same thing as the actual Python object.简单地说,一个 Python
variable
与实际的 Python object 不是一回事。 Variables (like s1
and s2
) are simply references that point to the memory location where the actual object lives.变量(如
s1
和s2
)只是指向实际 object 所在的 memory 位置的简单引用。
Because the original Series object s1
contained two list references, versus two list objects, only the references
for the internal list objects were copied (not the list objects themselves).因为原始系列 object
s1
包含两个列表引用,而不是两个列表对象,所以只复制了内部列表对象的references
(而不是列表对象本身)。
import pandas as pd
s1=pd.Series([[1,2],[3,4]])
# The oject referenced by variable "s1" has a memory address
print ("s1:", hex(id(s1)))
s2=s1.copy()
# The oject referenced by variable "s2" has a different memory address
print ("s2:", hex(id(s2)))
# However when you copied "s1", the
# list items within only had their references copied
# So "s1[0]" and "s2[0]" are simply references to the same object
print ("s1[0]:", hex(id(s1[0])))
print ("s2[0]:", hex(id(s2[0])))
OUTPUT: OUTPUT:
s1: 0x7fcdf5678898 # A different address form s2
s2: 0x7fcddee25240 # A different address form s1
s1[0]: 0x7fcdddf9f6c8 # The same address for the first list
s2[0]: 0x7fcdddf9f6c8 # The same address for the first list
@juanpa.arrivillaga
is correct in her answer that you need to use a deep copy @juanpa.arrivillaga
在她的回答中是正确的,您需要使用深层副本
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.