简体   繁体   English

多索引数据框:设置值

[英]Multi-indexed dataframe: Setting values

I already asked a related question earlier, but I didn't want to start a comment-and-edit-discussion. 我之前已经问了一个相关的问题,但我不想开始评论和编辑讨论。 So here's -boiled down - what the answer to my earlier question lead me to ask. 所以这里有点 - 我早先的问题的答案让我问。 Consider 考虑

import pandas as pd
from numpy import arange
from scipy import random

index = pd.MultiIndex.from_product([arange(0,3), arange(10,15)], names=['A', 'B'])
df = pd.DataFrame(columns=['test'], index=index)
someValues = random.randint(0, 10, size=5)

df.loc[0, 'test'] , df.loc[0,:] and df.ix[0] all create a representation of a part of the data frame, the first one being a Series and the other two being df slices. df.loc[0, 'test']df.loc[0,:]df.ix[0]都创建了数据帧的一部分的表示,第一个是系列,另外两个是df切片。 However 然而

  • df.ix[0] = df.loc[0,'test'] = someValues sets the value for the df df.ix[0] = df.loc[0,'test'] = someValues设置df的值
  • df.loc[0,'test'] = someValues gives an error ValueError: total size of new array must be unchanged df.loc[0,'test'] = someValues给出错误ValueError: total size of new array must be unchanged
  • df.loc[0,:] = someValues is being ignored. df.loc[0,:] = someValues被忽略。 No error, but the df does not contain the numpy array. 没有错误,但是df不包含numpy数组。

I skimmed the docs but there was no clear logical and systematical explanation on what is going on with MultiIndexes in general. 我浏览了文档,但对于MultiIndexes的一般情况,没有明确的逻辑和系统的解释。 So far, I guess that "if the view is a Series, you can set values", and "otherwise, god knows what happens". 到目前为止,我想“如果视图是一个系列,你可以设置值”,“否则,天知道会发生什么”。

Could someone shed some light on the logic? 有人会对逻辑有所了解吗? Moreover, is there some deep meaning behind this or are these just constraints due to how it is set up? 此外,这背后是否有一些深层含义,或者由于它是如何设置的,这些只是限制因素?

These are all with 0.13.1 这些都是0.13.1

These are not all 'slice' representations at all. 这些并非都是“切片”表示。

This is a Series. 这是一个系列。

In [50]: df.loc[0,'test']
Out[50]: 
B
10    NaN
11    NaN
12    NaN
13    NaN
14    NaN
Name: test, dtype: object

These are DataFrames (and the same) 这些是DataFrames(和相同)

In [51]: df.loc[0,:]
Out[51]: 
   test
B      
10  NaN
11  NaN
12  NaN
13  NaN
14  NaN

[5 rows x 1 columns]

In [52]: df.ix[0]
Out[52]: 
   test
B      
10  NaN
11  NaN
12  NaN
13  NaN
14  NaN

[5 rows x 1 columns]

This is trying to assign the wrong shape (it looks like it should work, but if you have multiple columns then it won't, that is why this is not allowed) 这是试图分配错误的形状(它看起来应该工作,但如果你有多个列,那么它不会,这就是为什么不允许这样做)

In [54]: df.ix[0] = someValues
ValueError: could not broadcast input array from shape (5) into shape (5,1)

This works because it knows how to broadcast 这是有效的,因为它知道如何广播

In [56]: df.loc[0,:] = someValues

In [57]: df
Out[57]: 
     test
A B      
0 10    4
  11    3
  12    4
  13    2
  14    8
1 10  NaN
  11  NaN
  12  NaN
  13  NaN
  14  NaN
2 10  NaN
  11  NaN
  12  NaN
  13  NaN
  14  NaN

[15 rows x 1 columns]

This works fine 这很好用

In [63]: df.loc[0,'test'] = someValues+1

In [64]: df
Out[64]: 
     test
A B      
0 10    5
  11    4
  12    5
  13    3
  14    9
1 10  NaN
  11  NaN
  12  NaN
  13  NaN
  14  NaN
2 10  NaN
  11  NaN
  12  NaN
  13  NaN
  14  NaN

[15 rows x 1 columns]

As does this 就像这样

In [66]: df.loc[0,:] = someValues+1

In [67]: df
Out[67]: 
     test
A B      
0 10    5
  11    4
  12    5
  13    3
  14    9
1 10  NaN
  11  NaN
  12  NaN
  13  NaN
  14  NaN
2 10  NaN
  11  NaN
  12  NaN
  13  NaN
  14  NaN

[15 rows x 1 columns]

Not clear where you generated the cases in your question. 不清楚您在问题中生成案例的位置。 I think the logic is pretty straightforward and consistent (their were several inconsistencies in prior versions however). 我认为逻辑非常简单和一致(但它们在先前版本中存在一些不一致)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM