简体   繁体   English

在熊猫的DF中间添加等级

[英]Adding level to middle of DF in Pandas

I would like to add a new level to my DF (so that then I can use pd.reindex to do something else). 我想向我的DF添加一个新级别(这样我就可以使用pd.reindex来做其他事情)。 My DF basically has something like this: 我的DF基本有这样的东西:

df = pd.DataFrame({('A','a'): [-1,-1,0,10,12],
                   ('A','b'): [0,1,2,3,-1],
                   ('A','c'): [-1,1,0,10,12],
                   ('A','d'): [1,1,2,3,-1],
                   ('B','a'): [-20,-10,0,10,20],
                   ('B','b'): [-200,-100,0,-1,200],
                   ('B','c'): [-20,-10,0,10,20],
                   ('B','d'): [-200,-100,0,100,200]
})

##df
    A               B
    a   b   c   d   a   b     c     d
0   -1  0   1   1   -20 -200  -20   -200
1   -1  1   -1  1   -10 -100  -10   -100
2   0   2   0   2   0   0     0     0
3   10  3   10  3   10  -1    10    100
4   12  -1  12  -1  20  200   20    200

I want to assign new level keys L1 = a + b , and L2 = c + d . 我想分配新的级别键L1 = a + bL2 = c + d How do I do this? 我该怎么做呢?

The desired output would be 所需的输出将是

##df
    A               B
    L1      L2      L1        L2
    a   b   c   d   a   b     c     d
0   -1  0   1   1   -20 -200  -20   -200
1   -1  1   -1  1   -10 -100  -10   -100
2   0   2   0   2   0   0     0     0
3   10  3   10  3   10  -1    10    100
4   12  -1  12  -1  20  200   20    200

Edit: the objective is to achieve something similar to what was asked in here . 编辑:目标是实现类似于此处要求的功能。 This means that some rows will have NA s for the same KEY, depending on the value of other columns. 这意味着某些行的同一KEY将具有NA ,具体取决于其他列的值。 Eg if I want to filter columns a and c by respectively testing whether columns b and d are negative: 例如,如果我想通过分别测试列bd是否为负数来过滤列ac

##df
    A               B
    L1      L2      L1        L2
    a   b   c   d   a   b     c     d
0   -1  0   1   1   NA  NA    NA    NA
1   -1  1   -1  1   NA  NA    NA    NA
2   0   2   0   2   0   0     0     0
3   10  3   10  3   NA  NA    10    100
4   NA  NA  NA  NA  20  200   20    200

You need create new array with map and then assign: 您需要使用map创建新array ,然后分配:

d = {'a':'L1','b':'L1','c':'L2','d':'L2'}
a = df.columns.get_level_values(1).map(lambda x: d[x])
print (a)
['L1' 'L1' 'L2' 'L2' 'L1' 'L1' 'L2' 'L2']

df.columns = [df.columns.get_level_values(0),a,df.columns.get_level_values(1)]
#same as
df.columns = pd.MultiIndex.from_arrays([df.columns.get_level_values(0),
                                        df.columns.get_level_values(1).map(lambda x: d[x]),
                                        df.columns.get_level_values(1)])
print (df)
    A             B              
   L1     L2     L1       L2     
    a  b   c  d   a    b   c    d
0  -1  0  -1  1 -20 -200 -20 -200
1  -1  1   1  1 -10 -100 -10 -100
2   0  2   0  2   0    0   0    0
3  10  3  10  3  10   -1  10  100
4  12 -1  12 -1  20  200  20  200

Second output is really complicated, for me works: 第二个输出确实很复杂,对我来说工作:

#filter columns
idx = pd.IndexSlice
mask = df.loc[:, idx[:,:,['b','d']]] < 0
print (mask)
       A             B       
      L1     L2     L1     L2
       b      d      b      d
0  False  False   True   True
1  False  False   True   True
2  False  False  False  False
3  False  False   True  False
4   True   True  False  False

#create mask to columns a,c
mask1 = mask.reindex(columns=df.columns)
mask1 = mask1.groupby(level=[0,1], axis=1).apply(lambda x: x.bfill(axis=1))
print (mask1)
       A                           B                     
      L1            L2            L1            L2       
       a      b      c      d      a      b      c      d
0  False  False  False  False   True   True   True   True
1  False  False  False  False   True   True   True   True
2  False  False  False  False  False  False  False  False
3  False  False  False  False   True   True  False  False
4   True   True   True   True  False  False  False  False

print (df.mask(mask1))
      A                     B                    
     L1         L2         L1           L2       
      a    b     c    d     a      b     c      d
0  -1.0  0.0  -1.0  1.0   NaN    NaN   NaN    NaN
1  -1.0  1.0   1.0  1.0   NaN    NaN   NaN    NaN
2   0.0  2.0   0.0  2.0   0.0    0.0   0.0    0.0
3  10.0  3.0  10.0  3.0   NaN    NaN  10.0  100.0
4   NaN  NaN   NaN  NaN  20.0  200.0  20.0  200.0

Another solution with reindex and method='bfill' , but is necessary double transpose (I think it is bug - works only with MultiIndex in index , not with MultiIndex in columns ): 带有reindexmethod='bfill'另一种解决方案,但有必要进行两次转置(我认为这是MultiIndexMultiIndexindex MultiIndex ,而不MultiIndexcolumns MultiIndex ):

idx = pd.IndexSlice
mask = df.loc[:, idx[:,['b','d']]] < 0
print (mask)
       A             B       
       b      d      b      d
0  False  False   True   True
1  False  False   True   True
2  False  False  False  False
3  False  False   True  False
4   True   True  False  False

mask1 = mask.T.reindex(df.columns, method='bfill').T
print (mask1)
       A                           B                     
       a      b      c      d      a      b      c      d
0  False  False  False  False   True   True   True   True
1  False  False  False  False   True   True   True   True
2  False  False  False  False  False  False  False  False
3  False  False  False  False   True   True  False  False
4   True   True   True   True  False  False  False  False

print (df.mask(mask1))
      A                     B                    
      a    b     c    d     a      b     c      d
0  -1.0  0.0  -1.0  1.0   NaN    NaN   NaN    NaN
1  -1.0  1.0   1.0  1.0   NaN    NaN   NaN    NaN
2   0.0  2.0   0.0  2.0   0.0    0.0   0.0    0.0
3  10.0  3.0  10.0  3.0   NaN    NaN  10.0  100.0
4   NaN  NaN   NaN  NaN  20.0  200.0  20.0  200.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM