[英]pandas- new calculated row for each unique string/group in a column
I have a dataframe df
like: 我有一个数据帧
df
如:
GROUP TYPE COUNT
A 1 5
A 2 10
B 1 3
B 2 9
C 1 20
C 2 100
I would like to add a row for each group such that the new row calculates the quotient of COUNT
where TYPE
equals 2 and COUNT
where TYPE
equals 1 for each GROUP
ala: 我想为每个组添加一行,以便新行计算
COUNT
的商,其中TYPE
等于2, COUNT
,其中TYPE
等于1,每个GROUP
ala:
GROUP TYPE COUNT
A 1 5
A 2 10
A .5
B 1 3
B 2 9
B .33
C 1 20
C 2 100
C .2
Thanks in advance. 提前致谢。
df2 = df.pivot(index='GROUP', columns='TYPE', values='COUNT')
df2['div'] = df2[1]/df2[2]
df2.reset_index().melt('GROUP').sort_values('GROUP')
Output: 输出:
GROUP TYPE value
0 A 1 5.000000
3 A 2 10.000000
6 A div 0.500000
1 B 1 3.000000
4 B 2 9.000000
7 B div 0.333333
2 C 1 20.000000
5 C 2 100.000000
8 C div 0.200000
My approach would be to reshape the dataframe by pivoting, so every type has its own column. 我的方法是通过旋转重塑数据帧,因此每种类型都有自己的列。 Then the division is very easy, and then by melting you reshape it back to the original shape.
然后划分很容易,然后通过熔化你重塑它原来的形状。 In my opinion this is also a very readable solution.
在我看来,这也是一个非常易读的解决方案。
Of course, if you prefer np.nan
to div
as a type, you can replace it very easily, but I'm not sure if that's what you want. 当然,如果你喜欢将
np.nan
作为一个类型的div
,你可以很容易地替换它,但我不确定这是不是你想要的。
s=df[df.TYPE.isin([1,2])].sort_values(['GROUP','TYPE']).groupby('GROUP').COUNT.apply(lambda x : x.iloc[0]/x.iloc[1])
# I am sort and filter your original df ,to make they are ordered and only have type 1 and 2
pd.concat([df,s.reset_index()]).sort_values('GROUP')
# cancat your result back
Out[77]:
COUNT GROUP TYPE
0 5.000000 A 1.0
1 10.000000 A 2.0
0 0.500000 A NaN
2 3.000000 B 1.0
3 9.000000 B 2.0
1 0.333333 B NaN
4 20.000000 C 1.0
5 100.000000 C 2.0
2 0.200000 C NaN
You can do: 你可以做:
import numpy as np
import pandas as pd
def add_quotient(x):
last_row = x.iloc[-1]
last_row['COUNT'] = x[x.TYPE == 1].COUNT.min() / x[x.TYPE == 2].COUNT.max()
last_row['TYPE'] = np.nan
return x.append(last_row)
print(df.groupby('GROUP').apply(add_quotient))
Output 产量
GROUP TYPE COUNT
GROUP
A 0 A 1.0 5.000000
1 A 2.0 10.000000
1 A NaN 0.500000
B 2 B 1.0 3.000000
3 B 2.0 9.000000
3 B NaN 0.333333
C 4 C 1.0 20.000000
5 C 2.0 100.000000
5 C NaN 0.200000
Note that the function select the min of the TYPE == 1
and the max of the TYPE == 2
, in case there is more than one value per group. 请注意,如果每个组有多个值,则函数选择
TYPE == 1
的最小值TYPE == 1
, TYPE == 1
的最大值TYPE == 2
。 And the TYPE is set to np.nan
, but that can be easily changed. 并且TYPE设置为
np.nan
,但这可以很容易地改变。
Here's a way first using sort_values' by '['GROUP', 'TYPE']
so ensuring that TYPE
2
comes before 1
and then GroupBy
GROUP
. 这是首先使用
sort_values' by '['GROUP', 'TYPE']
以确保TYPE
2
在1
之前,然后是GroupBy
GROUP
。
Then use first
and last
to compute the quocient and outer merging with df
: 然后使用
first
和last
来计算与df
的quocient和外部合并:
g = df.sort_values(['GROUP', 'TYPE']).groupby('GROUP')
s = (g.first()/ g.nth(1)).COUNT.reset_index()
df.merge(s, on = ['GROUP','COUNT'], how='outer').fillna(' ').sort_values('GROUP')
GROUP TYPE COUNT
0 A 1 5.000000
1 A 2 10.000000
6 A 0.500000
2 B 1 3.000000
3 B 2 9.000000
7 B 0.333333
4 C 1 20.000000
5 C 2 100.000000
8 C 0.200000
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.