pandas-列中每个唯一字符串/组的新计算行

Question

I have a dataframe df like: 我有一个数据帧df如：

GROUP  TYPE  COUNT
A       1     5
A       2     10
B       1     3
B       2     9
C       1     20
C       2     100

I would like to add a row for each group such that the new row calculates the quotient of COUNT where TYPE equals 2 and COUNT where TYPE equals 1 for each GROUP ala: 我想为每个组添加一行，以便新行计算COUNT的商，其中TYPE等于2， COUNT ，其中TYPE等于1，每个GROUP ala：

GROUP  TYPE  COUNT
A       1     5
A       2     10
A             .5
B       1     3
B       2     9
B             .33
C       1     20
C       2     100
C             .2

Thanks in advance. 提前致谢。

Answer 1

df2 = df.pivot(index='GROUP', columns='TYPE', values='COUNT')
df2['div'] = df2[1]/df2[2]
df2.reset_index().melt('GROUP').sort_values('GROUP')

Output: 输出：

  GROUP TYPE       value
0     A    1    5.000000
3     A    2   10.000000
6     A  div    0.500000
1     B    1    3.000000
4     B    2    9.000000
7     B  div    0.333333
2     C    1   20.000000
5     C    2  100.000000
8     C  div    0.200000

My approach would be to reshape the dataframe by pivoting, so every type has its own column. 我的方法是通过旋转重塑数据帧，因此每种类型都有自己的列。 Then the division is very easy, and then by melting you reshape it back to the original shape. 然后划分很容易，然后通过熔化你重塑它原来的形状。 In my opinion this is also a very readable solution. 在我看来，这也是一个非常易读的解决方案。

Of course, if you prefer np.nan to div as a type, you can replace it very easily, but I'm not sure if that's what you want. 当然，如果你喜欢将np.nan作为一个类型的div ，你可以很容易地替换它，但我不确定这是不是你想要的。

Answer 2

s=df[df.TYPE.isin([1,2])].sort_values(['GROUP','TYPE']).groupby('GROUP').COUNT.apply(lambda x : x.iloc[0]/x.iloc[1])
# I am sort and filter your original df ,to make they are ordered and only have type 1 and 2 
pd.concat([df,s.reset_index()]).sort_values('GROUP') 
# cancat your result back 

Out[77]: 
        COUNT GROUP  TYPE
0    5.000000     A   1.0
1   10.000000     A   2.0
0    0.500000     A   NaN
2    3.000000     B   1.0
3    9.000000     B   2.0
1    0.333333     B   NaN
4   20.000000     C   1.0
5  100.000000     C   2.0
2    0.200000     C   NaN

Answer 3

You can do: 你可以做：

import numpy as np
import pandas as pd

def add_quotient(x):
    last_row = x.iloc[-1]
    last_row['COUNT'] = x[x.TYPE == 1].COUNT.min() / x[x.TYPE == 2].COUNT.max()
    last_row['TYPE'] = np.nan
    return x.append(last_row)


print(df.groupby('GROUP').apply(add_quotient))

Output 产量

        GROUP  TYPE       COUNT
GROUP                          
A     0     A   1.0    5.000000
      1     A   2.0   10.000000
      1     A   NaN    0.500000
B     2     B   1.0    3.000000
      3     B   2.0    9.000000
      3     B   NaN    0.333333
C     4     C   1.0   20.000000
      5     C   2.0  100.000000
      5     C   NaN    0.200000

Note that the function select the min of the TYPE == 1 and the max of the TYPE == 2 , in case there is more than one value per group. 请注意，如果每个组有多个值，则函数选择TYPE == 1的最小值TYPE == 1 ， TYPE == 1的最大值TYPE == 2 。 And the TYPE is set to np.nan , but that can be easily changed. 并且TYPE设置为np.nan ，但这可以很容易地改变。

Answer 4

Here's a way first using sort_values' by '['GROUP', 'TYPE'] so ensuring that TYPE 2 comes before 1 and then GroupBy GROUP . 这是首先使用sort_values' by '['GROUP', 'TYPE']以确保TYPE 2在1之前，然后是GroupBy GROUP 。

Then use first and last to compute the quocient and outer merging with df : 然后使用first和last来计算与df的quocient和外部合并：

g = df.sort_values(['GROUP', 'TYPE']).groupby('GROUP')
s = (g.first()/ g.nth(1)).COUNT.reset_index()
df.merge(s, on = ['GROUP','COUNT'], how='outer').fillna(' ').sort_values('GROUP')

   GROUP TYPE       COUNT
0     A    1    5.000000
1     A    2   10.000000
6     A         0.500000
2     B    1    3.000000
3     B    2    9.000000
7     B         0.333333
4     C    1   20.000000
5     C    2  100.000000
8     C         0.200000

pandas-列中每个唯一字符串/组的新计算行

问题描述

4 个解决方案

解决方案1
4 已采纳 2019-01-03 16:56:36

解决方案2
2 2019-01-03 16:57:11

解决方案3
1 2019-01-03 16:55:11

解决方案4
1 2019-01-03 17:01:04

pandas-列中每个唯一字符串/组的新计算行

问题描述

4 个解决方案

解决方案1 4 已采纳 2019-01-03 16:56:36

解决方案2 2 2019-01-03 16:57:11

解决方案3 1 2019-01-03 16:55:11

解决方案4 1 2019-01-03 17:01:04

解决方案1
4 已采纳 2019-01-03 16:56:36

解决方案2
2 2019-01-03 16:57:11

解决方案3
1 2019-01-03 16:55:11

解决方案4
1 2019-01-03 17:01:04