![](/img/trans.png)
[英]How to groupby a column and agg as a list another column in Dask DataFrame?
[英]pandas dataframe groupby and agg to obtain a value if conditions in another column
我有一个像这样的 dataframe:
df_test = pd.DataFrame({'ID1':['A','A','A','A','A','A','A','A','A','A'],
'ID2':['a','a','a','aa','aaa','aaa','b','b','b','b'],
'ID3':['c1','c2','c3','c4','c5','c6','c7','c8','c9','c10'],
'condition1':[1,2,1,1,1,1,1,2,1,1],
'condition2':[80,85,88,80,70,83,85,90,90,70]})
我想通过['ID1','ID2','condition1']和(1)分组后的ID3中的值:如果组中只有一行,那么它将被选中(例如c4),( 2)如果组中还有一行,则将在组中的条件2为最大值(例如c3,c6,c9和c8)时选择它。 结果将是这样的:
df_test_result = pd.DataFrame({'ID1':['A','A','A','A','A','A'],
'ID2':['a','a','aa','aaa','b','b'],
'condition1':[2,1,1,1,2,1],
'condition2':[85,88,80,83,90,90],
'ID3':['c2','c3','c4','c6','c8','c9']})
df_test_result
流程貌似是这样的,但是效率太低(因为需要一起联系):
groups = df_test.groupby(['ID1','ID2','condition1'])
for group in groups:
dfi = group[1][group[1]['condition2']==group[1]['condition2'].max()]
print(dfi,'\n')
您的条件 (1) 概括为 (2),因此您始终可以根据condition2
查看组中的第一行:
(
df_test
.sort_values("condition2", ascending=False) # sort everything by condition2
.groupby(["ID1", "ID2", "condition1"])
.first() # select first row in each group (now ordered by condition2)
.reset_index() # reset groupby columns
)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.