简体   繁体   English

添加两列的两个值并将结果分配给熊猫多索引数据帧中的第三列

[英]Adding two values of two columns and assigning the result to a third column in a pandas multi-index DataFrame

I have a Pandas dataframe:我有一个熊猫数据框:

a=[1,1,1,2,2,2,3,3,3]
dic={'A':a}

df=pd.DataFrame(dic)

I apply a multi-index to this df:我对这个 df 应用了一个多索引:

index=[(1,'a'),(1,'b'),(1,'c'),(2,'a'),(2,'b'), (2, 'c'),(3,'a'),(3,'b'), (3,'c')]
df.index=pd.MultiIndex.from_tuples(index, names=['X','Y'])

I add a new column:我添加一个新列:

df['B']='-'

Now I have a df:现在我有一个 df:

       A   B 
X Y          
1 a    1   -
  b    1   -
  c    1   -
2 a    2   -
  b    2   -
  c    2   -
3 a    3   -
  b    3   -
  c    3   -

Essentially, I want to cycle through level='X' of the multi-index, adding one level to another, and then assigning the values to column='B'本质上,我想循环遍历多索引的 level='X',将一个级别添加到另一个级别,然后将值分配给 column='B'

Here's how I was thinking about doing it:这是我如何考虑这样做的:

dex=[]
for idx, select_df in df.groupby(level=0):
    dex.append(idx)
#gives me a list of level='X' keys

dex_iter=iter(dex)
#creates an iterator from that list

last=next(dex_iter)
#gives me the first value of that list of keys, and moves the iterator to the next value

for i in dex_iter:
    
    df.loc[i,'B']=df.loc[i,'A']+df.loc[last,'A']
    last=i

My EXPECTED result is:我的预期结果是:

      A   B
X Y        
1 a   1   -
  b   1   -
  c   1   -
2 a   2   3
  b   2   3
  c   2   3
3 a   3   5
  b   3   5
  c   3   5

Instead, what I get is:相反,我得到的是:

      A    B
X Y        
1 a   1    -
  b   1    -
  c   1    -
2 a   2  NaN
  b   2  NaN
  c   2  NaN
3 a   3  NaN
  b   3  NaN
  c   3  NaN

This is obviously due to some peculiarity with assigning the values to the multi-index.这显然是由于将值分配给多索引的一些特殊性。 But I can't find a way to resolve this issue.但我找不到解决这个问题的方法。

Let's try groupby , first , and shift :让我们尝试groupbyfirstshift

df.groupby(level=0)['A'].first().shift()

X
1    NaN
2    1.0
3    2.0
Name: A, dtype: float64

tmp = df.index.get_level_values(0).map(df.groupby(level=0)['A'].first().shift())
print (tmp)
# Float64Index([
#    nan, nan, nan, 1.0, 1.0, 1.0, 2.0, 2.0, 2.0], dtype='float64', name='X')

This gives you the values you need to add to "A" to get "B":这为您提供了需要添加到“A”以获得“B”的值:

df['B'] = df['A'] + tmp
df

     A    B
X Y        
1 a  1  NaN
  b  1  NaN
  c  1  NaN
2 a  2  3.0
  b  2  3.0
  c  2  3.0
3 a  3  5.0
  b  3  5.0
  c  3  5.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM