[英]Getting a list of values of a column depending on the conditions of other columns
[英]Assessing values to a pandas column with conditions depending on other columns
我有一个数据框:
df_test = pd.DataFrame({'col': ['paris', 'paris', 'nantes', 'berlin', 'berlin', 'berlin', 'tokyo'],
'id_res': [12, 12, 14, 28, 8, 4, 89]})
col id_res
0 paris 12
1 paris 12
2 nantes 14
3 berlin 28
4 berlin 8
5 berlin 4
6 tokyo 89
我想创建一个“检查”列,其值如下:
因此,我想要的输出是:
col id_res check
0 paris 12 False
1 paris 12 False
2 nantes 14 False
3 berlin 28 True
4 berlin 8 False
5 berlin 4 False
6 tokyo 89 False
我尝试了 groupby 但没有令人满意的结果。 任何人都可以帮助我吗?
创建 2 个布尔掩码,然后将它们组合并找到每个col
的最高id_res
值:
m1 = df['col'].duplicated(keep=False)
m2 = ~df['id_res'].duplicated(keep=False)
df['check'] = df.index.isin(df[m1 & m2].groupby('col')['id_res'].idxmax())
print(df)
# Output
col id_res check
0 paris 12 False
1 paris 12 False
2 nantes 14 False
3 berlin 28 True
4 berlin 8 False
5 berlin 4 False
6 tokyo 89 False
细节:
>>> pd.concat([df, m1.rename('m1'), m2.rename('m2')])
col id_res check m1 m2
0 paris 12 False True False
1 paris 12 False True False
2 nantes 14 False False True
3 berlin 28 True True True # <- group to check
4 berlin 8 False True True # <- because
5 berlin 4 False True True # <- m1 and m2 are True
6 tokyo 89 False False True
你基本上有 3 个条件,所以使用掩码并取逻辑交集 (AND/ &
):
g = df_test.groupby('col')['id_res']
# is col duplicated?
m1 = df_test['col'].duplicated(keep=False)
# [ True True False True True True False]
# is id_res max of its group?
m2 = df_test['id_res'].eq(g.transform('max'))
# [ True True True True False False True]
# is group diverse? (more than 1 id_res)
m3 = g.transform('nunique').gt(1)
# [False False False True True True False]
# check if all conditions True
df_test['check'] = m1&m2&m3
输出:
col id_res check
0 paris 12 False
1 paris 12 False
2 nantes 14 False
3 berlin 28 True
4 berlin 8 False
5 berlin 4 False
6 tokyo 89 False
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.