简体   繁体   English

pandas-列中每个唯一字符串/组的新计算行

[英]pandas- new calculated row for each unique string/group in a column

I have a dataframe df like: 我有一个数据帧df如:

GROUP  TYPE  COUNT
A       1     5
A       2     10
B       1     3
B       2     9
C       1     20
C       2     100

I would like to add a row for each group such that the new row calculates the quotient of COUNT where TYPE equals 2 and COUNT where TYPE equals 1 for each GROUP ala: 我想为每个组添加一行,以便新行计算COUNT的商,其中TYPE等于2, COUNT ,其中TYPE等于1,每个GROUP ala:

GROUP  TYPE  COUNT
A       1     5
A       2     10
A             .5
B       1     3
B       2     9
B             .33
C       1     20
C       2     100
C             .2

Thanks in advance. 提前致谢。

df2 = df.pivot(index='GROUP', columns='TYPE', values='COUNT')
df2['div'] = df2[1]/df2[2]
df2.reset_index().melt('GROUP').sort_values('GROUP')

Output: 输出:

  GROUP TYPE       value
0     A    1    5.000000
3     A    2   10.000000
6     A  div    0.500000
1     B    1    3.000000
4     B    2    9.000000
7     B  div    0.333333
2     C    1   20.000000
5     C    2  100.000000
8     C  div    0.200000

My approach would be to reshape the dataframe by pivoting, so every type has its own column. 我的方法是通过旋转重塑数据帧,因此每种类型都有自己的列。 Then the division is very easy, and then by melting you reshape it back to the original shape. 然后划分很容易,然后通过熔化你重塑它原来的形状。 In my opinion this is also a very readable solution. 在我看来,这也是一个非常易读的解决方案。

Of course, if you prefer np.nan to div as a type, you can replace it very easily, but I'm not sure if that's what you want. 当然,如果你喜欢将np.nan作为一个类型的div ,你可以很容易地替换它,但我不确定这是不是你想要的。

s=df[df.TYPE.isin([1,2])].sort_values(['GROUP','TYPE']).groupby('GROUP').COUNT.apply(lambda x : x.iloc[0]/x.iloc[1])
# I am sort and filter your original df ,to make they are ordered and only have type 1 and 2 
pd.concat([df,s.reset_index()]).sort_values('GROUP') 
# cancat your result back 

Out[77]: 
        COUNT GROUP  TYPE
0    5.000000     A   1.0
1   10.000000     A   2.0
0    0.500000     A   NaN
2    3.000000     B   1.0
3    9.000000     B   2.0
1    0.333333     B   NaN
4   20.000000     C   1.0
5  100.000000     C   2.0
2    0.200000     C   NaN

You can do: 你可以做:

import numpy as np
import pandas as pd

def add_quotient(x):
    last_row = x.iloc[-1]
    last_row['COUNT'] = x[x.TYPE == 1].COUNT.min() / x[x.TYPE == 2].COUNT.max()
    last_row['TYPE'] = np.nan
    return x.append(last_row)


print(df.groupby('GROUP').apply(add_quotient))

Output 产量

        GROUP  TYPE       COUNT
GROUP                          
A     0     A   1.0    5.000000
      1     A   2.0   10.000000
      1     A   NaN    0.500000
B     2     B   1.0    3.000000
      3     B   2.0    9.000000
      3     B   NaN    0.333333
C     4     C   1.0   20.000000
      5     C   2.0  100.000000
      5     C   NaN    0.200000

Note that the function select the min of the TYPE == 1 and the max of the TYPE == 2 , in case there is more than one value per group. 请注意,如果每个组有多个值,则函数选择TYPE == 1的最小值TYPE == 1TYPE == 1的最大值TYPE == 2 And the TYPE is set to np.nan , but that can be easily changed. 并且TYPE设置为np.nan ,但这可以很容易地改变。

Here's a way first using sort_values' by '['GROUP', 'TYPE'] so ensuring that TYPE 2 comes before 1 and then GroupBy GROUP . 这是首先使用sort_values' by '['GROUP', 'TYPE']以确保TYPE 21之前,然后是GroupBy GROUP

Then use first and last to compute the quocient and outer merging with df : 然后使用firstlast来计算与df的quocient和外部合并:

g = df.sort_values(['GROUP', 'TYPE']).groupby('GROUP')
s = (g.first()/ g.nth(1)).COUNT.reset_index()
df.merge(s, on = ['GROUP','COUNT'], how='outer').fillna(' ').sort_values('GROUP')

   GROUP TYPE       COUNT
0     A    1    5.000000
1     A    2   10.000000
6     A         0.500000
2     B    1    3.000000
3     B    2    9.000000
7     B         0.333333
4     C    1   20.000000
5     C    2  100.000000
8     C         0.200000

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Pandas:为每个唯一行获取一个新列 - Pandas: get a new column for each unique row Pandas-将列转换为(条件)聚合字符串 - Pandas- pivoting column into (conditional) aggregated string 熊猫向组中每一行的新列添加一个值 - pandas add a value to new column to each row in a group Pandas-如何获取另一列中每个对应值的行数 - Pandas- How to get number of times row occurs for each corresponding value in another column Python Pandas-如何解开具有两个值的数据透视表,每个值变成一个新列? - Python Pandas- how to unstack a pivot table with two values with each value becoming a new column? Pandas将随机字符串分配给每个组作为新列 - Pandas assigning random string to each group as new column 如何使用计算值在 pandas 中创建新列并为每一行分配特定值? - How do I create a new column in pandas using calculated values and assign specific values to each row? Python Pandas 旋转:如何在第一列中分组并为第二列中的每个唯一值创建一个新列 - Python Pandas pivoting: how to group in the first column and create a new column for each unique value from the second column 有没有办法向pandas数据框添加新列,将新列的每个唯一值附加到数据帧的每个现有行? - Is there a way to add a new column to a pandas dataframe, appending each unique value of the new column to every existing row of the dataframe? 熊猫-创建一个新列,并在另一列中填充观察值 - Pandas- Create a new column filled with the number of observations in another column
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM