简体   繁体   English

使用MultiIndex将值的子集分配给pandas数据框

[英]Assign subset of values to pandas dataframe with MultiIndex

I have a DataFrame df : 我有一个DataFrame df

                             **Count**
**Environment** **Type**    
**A**            a           100
                 b           200
                 c           300
                 d           400
                 e           500
                 f           600
**B**            a           1000
                 b           2000
                 c           3000
                 d           4000
                 e           5000
                 f           6000

The df.index spits out the following index: df.index吐出以下索引:

    MultiIndex(levels=[['A', 'B'], ['a', 'b', 'c', 'd', 'e', 'f']],
               labels=[[0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1], 
                       [0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5]],
               names=['A', 'B'])

I need to calculate the percentage of Counts per A and B. So I do: 我需要计算每个A和B的计数百分比。所以我这样做:

sums = df.groupby(level = 0).sum()
df.loc['A'] = df.loc['A'].apply(lambda x: x/sums.loc['A','Count'])
df.loc['B'] = df.loc['B'].apply(lambda x: x/sums.loc['B','Count'])

However, this results into all values being NaN . 但是,这导致所有值均为NaN

I suspect that the index of df.loc['B'].apply(lambda x: x/sums.loc['B','Count']) is not the same as the index of df , but it should be the same with the part of df that I am selecting. 我怀疑df.loc['B'].apply(lambda x: x/sums.loc['B','Count'])的索引与df的索引不同,但应该是与我选择的df部分相同。

These by themselves 这些都是他们自己

df.loc['A'].apply(lambda x: x/sums.loc['A','Count'])
df.loc['B'].apply(lambda x: x/sums.loc['B','Count'])

have the values I need, so division works. 拥有我需要的价值,因此划分有效。 But, assignment does not. 但是,分配不是。

How do I assign the the result of the abovementioned expression to the part of the dataframe df ? 如何将上述表达式的结果分配给数据帧 df 的一部分

Using div to assign the value 使用div分配值

s=df.Count.div(df.Count.sum(level=0),axis=0,level=0)
df['per']=s
df
Out[1253]: 
                          Count       per
**Environment** **Type**                 
A               a           100  0.047619
                b           200  0.095238
                c           300  0.142857
                d           400  0.190476
                e           500  0.238095
                f           600  0.285714
B               a          1000  0.047619
                b          2000  0.095238
                c          3000  0.142857
                d          4000  0.190476
                e          5000  0.238095
                f          6000  0.285714

You can simply do df/sums , no need for loop. 您可以简单地执行df/sums ,而无需循环。

Since that you want to assign to a particular part of dataframe you can do it this way. 由于您要分配给数据框的特定部分,因此可以使用这种方法。 Keep the depth of computed df 1 level higher. 保持较高的df 1级深度。

df.loc['A',:] = df.loc['A',:,:].apply(lambda x: x/sums.loc['A','Count'])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM