[英]Pandas assignment unexpected behavior
I am just playing with pandas, trying to modify values of a column.我只是在玩 pandas,试图修改列的值。
My initial dataframe is:我最初的 dataframe 是:
df = pd.DataFrame(
dict(x=[1, 2, 3, 4, 5, 6, 7], y=[10, 11, 15, 14, 14, 25, 25)
)
df.index = list('abcdefg')
with output:与 output:
>>> df
x y
a 1 10
b 2 11
c 3 15
d 4 14
e 5 14
f 6 25
g 7 25
Suppose that I want to modify the first element of x
column.假设我要修改
x
列的第一个元素。 I do:我愿意:
df.loc['a', 'x'] = 100
which outputs:输出:
>>> df.loc['a', 'x'] = 100
>>> df
x y
a 100 10
b 2 11
c 3 15
d 4 14
e 5 14
f 6 25
g 7 25
What I can't understand is why the following:我无法理解的是为什么会出现以下情况:
>>> j = df['x']
>>> j['a'] = 200
>>> df
x y
a 200 10
b 2 11
c 3 15
d 4 14
e 5 14
f 6 25
g 7 25
also modifies the first element of x
column in df
.还修改
df
中x
列的第一个元素。 Furthermore:此外:
>>> df.loc['a', 'x'] is j['a']
False
which means that they don't point to the same object.这意味着它们不指向同一个 object。 What is going on?
到底是怎么回事?
You are not performing the correct test.您没有执行正确的测试。 You should rather test:
你应该测试:
j is df['x']
output: True
output:
True
j
and df['x']
point to the same Series. j
和df['x']
指向同一个系列。
The False
is explained by the underlying numpy array that does not contain python objects. False
由不包含 python 对象的底层 numpy 数组解释。 The object are generated during slicing: object 在切片期间生成:
import numpy as np
a = np.array([1, 2, 3])
a[0] is a[0]
output: False
output:
False
That is why we need copy
here这就是为什么我们需要在这里
copy
j = df['x'].copy()
Notice after add copy the id number is different注意添加副本后id号不同
id(df['x'])
Out[612]: 140536670316496
id(df['x'].copy())
Out[613]: 140536673228496
Because j = df['x']
only assigns the object reference of df['x']
to j
.因为
j = df['x']
仅将df['x']
['x'] 的 object 引用分配给j
。 Anything modified through j
will impact the same object in memory as the one behind df['x']
.通过
j
修改的任何内容都会影响 memory 中与df['x']
后面的相同的 object 。
If you want to make a copy of the object value behind df['x']
, store it into j
, and modify it independently from df
, you need to use .copy()
method as:如果要复制
df['x']
后面的 object 值,将其存储到j
中,并独立于df
进行修改,则需要使用.copy()
方法:
j = df['x'].copy()
Read more about it at https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.copy.html在https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.copy.html了解更多信息
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.