[英]Multi-indexed dataframe: Setting values
I already asked a related question earlier, but I didn't want to start a comment-and-edit-discussion. 我之前已经问了一个相关的问题,但我不想开始评论和编辑讨论。 So here's -boiled down - what the answer to my earlier question lead me to ask.
所以这里有点 - 我早先的问题的答案让我问。 Consider
考虑
import pandas as pd
from numpy import arange
from scipy import random
index = pd.MultiIndex.from_product([arange(0,3), arange(10,15)], names=['A', 'B'])
df = pd.DataFrame(columns=['test'], index=index)
someValues = random.randint(0, 10, size=5)
df.loc[0, 'test']
, df.loc[0,:]
and df.ix[0]
all create a representation of a part of the data frame, the first one being a Series and the other two being df slices. df.loc[0, 'test']
, df.loc[0,:]
和df.ix[0]
都创建了数据帧的一部分的表示,第一个是系列,另外两个是df切片。 However 然而
df.ix[0] = df.loc[0,'test'] = someValues
sets the value for the df df.ix[0] = df.loc[0,'test'] = someValues
设置df的值 df.loc[0,'test'] = someValues
gives an error ValueError: total size of new array must be unchanged
df.loc[0,'test'] = someValues
给出错误ValueError: total size of new array must be unchanged
df.loc[0,:] = someValues
is being ignored. df.loc[0,:] = someValues
被忽略。 No error, but the df does not contain the numpy array. I skimmed the docs but there was no clear logical and systematical explanation on what is going on with MultiIndexes in general. 我浏览了文档,但对于MultiIndexes的一般情况,没有明确的逻辑和系统的解释。 So far, I guess that "if the view is a Series, you can set values", and "otherwise, god knows what happens".
到目前为止,我想“如果视图是一个系列,你可以设置值”,“否则,天知道会发生什么”。
Could someone shed some light on the logic? 有人会对逻辑有所了解吗? Moreover, is there some deep meaning behind this or are these just constraints due to how it is set up?
此外,这背后是否有一些深层含义,或者由于它是如何设置的,这些只是限制因素?
These are all with 0.13.1 这些都是0.13.1
These are not all 'slice' representations at all. 这些并非都是“切片”表示。
This is a Series. 这是一个系列。
In [50]: df.loc[0,'test']
Out[50]:
B
10 NaN
11 NaN
12 NaN
13 NaN
14 NaN
Name: test, dtype: object
These are DataFrames (and the same) 这些是DataFrames(和相同)
In [51]: df.loc[0,:]
Out[51]:
test
B
10 NaN
11 NaN
12 NaN
13 NaN
14 NaN
[5 rows x 1 columns]
In [52]: df.ix[0]
Out[52]:
test
B
10 NaN
11 NaN
12 NaN
13 NaN
14 NaN
[5 rows x 1 columns]
This is trying to assign the wrong shape (it looks like it should work, but if you have multiple columns then it won't, that is why this is not allowed) 这是试图分配错误的形状(它看起来应该工作,但如果你有多个列,那么它不会,这就是为什么不允许这样做)
In [54]: df.ix[0] = someValues
ValueError: could not broadcast input array from shape (5) into shape (5,1)
This works because it knows how to broadcast 这是有效的,因为它知道如何广播
In [56]: df.loc[0,:] = someValues
In [57]: df
Out[57]:
test
A B
0 10 4
11 3
12 4
13 2
14 8
1 10 NaN
11 NaN
12 NaN
13 NaN
14 NaN
2 10 NaN
11 NaN
12 NaN
13 NaN
14 NaN
[15 rows x 1 columns]
This works fine 这很好用
In [63]: df.loc[0,'test'] = someValues+1
In [64]: df
Out[64]:
test
A B
0 10 5
11 4
12 5
13 3
14 9
1 10 NaN
11 NaN
12 NaN
13 NaN
14 NaN
2 10 NaN
11 NaN
12 NaN
13 NaN
14 NaN
[15 rows x 1 columns]
As does this 就像这样
In [66]: df.loc[0,:] = someValues+1
In [67]: df
Out[67]:
test
A B
0 10 5
11 4
12 5
13 3
14 9
1 10 NaN
11 NaN
12 NaN
13 NaN
14 NaN
2 10 NaN
11 NaN
12 NaN
13 NaN
14 NaN
[15 rows x 1 columns]
Not clear where you generated the cases in your question. 不清楚您在问题中生成案例的位置。 I think the logic is pretty straightforward and consistent (their were several inconsistencies in prior versions however).
我认为逻辑非常简单和一致(但它们在先前版本中存在一些不一致)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.