简体   繁体   English

熊猫数据框字典:如何知道是引用字典中的数据框副本还是引用字典中的数据框?

[英]A dict of pandas dataframes: how to know if referencing a copy of the dataframe in the dict or the dataframe inside the dict?

Hi I've run into some interesting behavior that I can't find an explanation. 嗨,我遇到了一些有趣的行为,我找不到解释。

I have a dict called combinedDict. 我有一个叫CombinedDict的字典。 It has string keys and its elements are pandas dataframes. 它具有字符串键,其元素是pandas数据帧。

I want to select the dataframe named 'early'. 我想选择一个名为“ early”的数据框。 I create a variable a equal to that dataframe. 我创建了一个变量等于数据帧。 I then want to edit the ID column in that dataframe by appending the string '_early' to each row of that column. 然后,我想通过将字符串“ _early”附加到该列的每一行来编辑该数据框中的ID列。 I do this using the following code: 我使用以下代码执行此操作:

a = combinedDict['early']
a['ID'] = [(s + '_early') for s in a['ID'].tolist()]

When I do this, the string '_early' is appended to every row of the column in dataframe a but it is also appended to every row of the dataframe stored in combinedDict['early']. 当我这样做时,字符串“ _early”将附加到数据帧a中列的每一行,但也将附加到存储在CombinedDict ['early']中的数据帧的每一行。

On the other hand, when I use the following code, adding .copy(), only dataframe a is affected while the one stored in combinedDict['early'] is not. 另一方面,当我使用以下代码时,添加.copy(),仅会影响数据帧a,而不会影响存储在CombineDict ['early']中的数据帧。 This is the first time I've run into this behavior. 这是我第一次遇到这种行为。 Is this just a feature of pandas dataframes? 这只是熊猫数据框的功能吗?

Let's try this: 让我们尝试一下:

In [87]: df1 = pd.DataFrame({'a': [1,2,3,4,5,6,7,8,9]})

In [88]: df1
Out[88]:
   a
0  1
1  2
2  3
3  4
4  5
5  6
6  7
7  8
8  9

In [89]: df2 = df1

In [90]: id(df1) == id(df2)
Out[90]: True

In [91]: df2.ix[df2.a > 4, 'a'] = 0

In [92]: df1
Out[92]:
   a
0  1
1  2
2  3
3  4
4  0
5  0
6  0
7  0
8  0

So your a is a reference to combinedDict['early'] 因此,您的a是对combinedDict['early']引用

Here is an excerpt from the documentation : 这是文档摘录:

Mutability and copying of data 数据的可变性和复制

All pandas data structures are value-mutable (the values they contain can be altered) but not always size-mutable. 所有的熊猫数据结构都是值可变的(它们包含的值可以更改),但并不总是大小可变的。 The length of a Series cannot be changed, but, for example, columns can be inserted into a DataFrame. 系列的长度不能更改,但是,例如,可以将列插入到DataFrame中。 However, the vast majority of methods produce new objects and leave the input data untouched. 但是,绝大多数方法都会产生新对象,并保持输入数据不变。 In general, though, we like to favor immutability where sensible. 总的来说,尽管如此,我们还是希望在合理的地方支持不变性

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM