![](/img/trans.png)
[英]How to groupby a column and agg as a list another column in Dask DataFrame?
[英]pandas dataframe groupby and agg to obtain a value if conditions in another column
我有一個像這樣的 dataframe:
df_test = pd.DataFrame({'ID1':['A','A','A','A','A','A','A','A','A','A'],
'ID2':['a','a','a','aa','aaa','aaa','b','b','b','b'],
'ID3':['c1','c2','c3','c4','c5','c6','c7','c8','c9','c10'],
'condition1':[1,2,1,1,1,1,1,2,1,1],
'condition2':[80,85,88,80,70,83,85,90,90,70]})
我想通過['ID1','ID2','condition1']和(1)分組后的ID3中的值:如果組中只有一行,那么它將被選中(例如c4),( 2)如果組中還有一行,則將在組中的條件2為最大值(例如c3,c6,c9和c8)時選擇它。 結果將是這樣的:
df_test_result = pd.DataFrame({'ID1':['A','A','A','A','A','A'],
'ID2':['a','a','aa','aaa','b','b'],
'condition1':[2,1,1,1,2,1],
'condition2':[85,88,80,83,90,90],
'ID3':['c2','c3','c4','c6','c8','c9']})
df_test_result
流程貌似是這樣的,但是效率太低(因為需要一起聯系):
groups = df_test.groupby(['ID1','ID2','condition1'])
for group in groups:
dfi = group[1][group[1]['condition2']==group[1]['condition2'].max()]
print(dfi,'\n')
您的條件 (1) 概括為 (2),因此您始終可以根據condition2
查看組中的第一行:
(
df_test
.sort_values("condition2", ascending=False) # sort everything by condition2
.groupby(["ID1", "ID2", "condition1"])
.first() # select first row in each group (now ordered by condition2)
.reset_index() # reset groupby columns
)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.