[英]Python groupby - change column values based on conditions in other columns
I wanna groupby the column 'group' first.我想先对“组”列进行分组。 Then change the values in the result column based on the conditions in the result and rank columns.
然后根据结果列和排名列中的条件更改结果列中的值。
This is what I have now:这就是我现在所拥有的:
import pandas as pd
import numpy as np
group = ['g1','g1','g1','g1','g1','g2','g2','g2','g2','g2','g2']
rank = ['1','2','3','4','5','1','2','3','4','5','6']
result = ['1','4','2','4','4','1','4','4','2','4','4']
df = pd.DataFrame({"group": group, "rank": rank, "result": result})
group rank result
0 g1 1 1
1 g1 2 4
2 g1 3 2
3 g1 4 4
4 g1 5 4
5 g2 1 1
6 g2 2 4
7 g2 3 4
8 g2 4 2
9 g2 5 4
10 g2 6 4
In the same group, I wanna change the result from 4 to 6 when the rank is greater than the rank of result = 2在同一组中,当等级大于结果的等级 = 2 时,我想将结果从 4 更改为 6
For example: in g1, the rank of result = 2 is 3. So the result of rank 4 & 5 will be 6.例如:在 g1 中,result = 2 的排名是 3。所以排名 4 & 5 的结果将是 6。
in g2, the rank of result = 2 is 4. So the result of rank 5 & 6 will be 6.在 g2 中,result = 2 的排名是 4。所以排名 5 & 6 的结果将是 6。
In this case, my desired output will be:在这种情况下,我想要的输出将是:
group rank result
0 g1 1 1
1 g1 2 4
2 g1 3 2
3 g1 4 6
4 g1 5 6
5 g2 1 1
6 g2 2 4
7 g2 3 4
8 g2 4 2
9 g2 5 6
10 g2 6 6
I haven't got any idea the best way to achieve this.我不知道实现这一目标的最佳方法。 Can anyone help?
任何人都可以帮忙吗?
Thanks in advance!提前致谢!
Use Series.where
for replace rank
to NaN
for rows matched by 2
in result and then use GroupBy.transform
for repeat values per groups by GroupBy.first
, last compare for greater by Series.gt
and set value 6
in DataFrame.loc
:使用
Series.where
用于替代rank
到NaN
供匹配的行2
的结果,然后使用GroupBy.transform
为每团的重复值GroupBy.first
,最后通过比较更大Series.gt
和设定值6
在DataFrame.loc
:
#convert to integers for correct compare values greater like '10'
df[['rank','result']] = df[['rank','result']].astype(int)
s = df['rank'].where(df['result'].eq(2)).groupby(df['group']).transform('first')
df.loc[df['rank'].gt(s), 'result'] = 6
print (df)
group rank result
0 g1 1 1
1 g1 2 4
2 g1 3 2
3 g1 4 6
4 g1 5 6
5 g2 1 1
6 g2 2 4
7 g2 3 4
8 g2 4 2
9 g2 5 6
10 g2 6 6
This will do the trick这将解决问题
import pandas as pd
import numpy as np
group = ['g1','g1','g1','g1','g1','g2','g2','g2','g2','g2','g2']
rank = ['1','2','3','4','5','1','2','3','4','5','6']
result = ['1','4','2','4','4','1','4','4','2','4','4']
df = pd.DataFrame({"group": group, "rank": rank, "result": result})
def changeDf(x):
df_gp = df[df['group'] == x['group']]
rank_of_2 = df_gp.loc[df_gp['result'] =='2', 'rank'].values[0]
if int(x['rank']) > int(rank_of_2):
return '6'
else:
return x['result']
df['result'] = df.apply(changeDf, axis=1)
print(df)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.