[英]Subtract values from maximum value within groups
Trying to take a df and create a new column thats based on the difference between the Value in a group and that groups max: 尝试获取df并根据组中的值与组max之间的差异创建新列:
Group Value
A 4
A 6
A 10
B 5
B 8
B 11
End up with a new column "from_max" 最后得到一个新的列“from_max”
from_max
6
4
0
6
3
0
I tried this but a ValueError: 我试过这个但是一个ValueError:
df['from_max'] = df.groupby(['Group']).apply(lambda x: x['Value'].max() - x['Value'])
Thanks in Advance 提前致谢
Option 1 选项1
vectorised groupby
+ transform
矢量化groupby
+ transform
df['from_max'] = df.groupby('Group').Value.transform('max') - df.Value
df
Group Value from_max
0 A 4 6
1 A 6 4
2 A 10 0
3 B 5 6
4 B 8 3
5 B 11 0
Option 2 选项2
index aligned subtraction 索引对齐减法
df['from_max'] = (df.groupby('Group').Value.max() - df.set_index('Group').Value).values
df
Group Value from_max
0 A 4 6
1 A 6 4
2 A 10 0
3 B 5 6
4 B 8 3
5 B 11 0
I think need GroupBy.transform
for return Series
with same size as original DataFrame
: 我认为需要GroupBy.transform
退货Series
具有相同尺寸的原始DataFrame
:
df['from_max'] = df.groupby(['Group'])['Value'].transform(lambda x: x.max() - x)
Or: 要么:
df['from_max'] = df.groupby(['Group'])['Value'].transform(max) - df['Value']
Alternative is Series.map
by aggregate max
: 另一种是通过聚合max
Series.map
:
df['from_max'] = df['Group'].map(df.groupby(['Group'])['Value'].max()) - df['Value']
print (df)
Group Value from_max
0 A 4 6
1 A 6 4
2 A 10 0
3 B 5 6
4 B 8 3
5 B 11 0
Using reindex
使用reindex
df['From_Max']=df.groupby('Group').Value.max().reindex(df.Group).values-df.Value.values
df
Out[579]:
Group Value From_Max
0 A 4 6
1 A 6 4
2 A 10 0
3 B 5 6
4 B 8 3
5 B 11 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.