[英]Python: Groupby with conditions in pandas dataframe?
我有一個 dataframe 如下所示。
我需要做 groupby(country and product) 並且Value列應該包含count(id) ,其中 status 是關閉的,我需要返回剩余的列。 預期的 output 格式如下。
Sample input
id status ticket_time product country last_load_time metric_id name
1260057 open 2021-10-04 01:20:00 Broadband Grenada 2021-12-09 09:57:27 MTR013 repair
2998178 open 2021-10-02 00:00:00 Fixed Voice Bahamas 2021-12-09 09:57:27 MTR013 repair
3762949 closed 2021-10-01 00:00:00 Fixed Voice St Lucia 2021-12-09 09:57:27 MTR013 repair
3766608 closed 2021-10-04 00:00:00 Broadband St Lucia 2021-12-09 09:57:27 MTR013 repair
3767125 closed 2021-10-04 00:00:00 TV Antigua 2021-12-09 09:57:27 MTR013 repair
6050009 closed 2021-10-01 00:00:00 TV Jamaica 2021-12-09 09:57:27 MTR013 repair
6050608 open 2021-10-01 00:00:00 Broadband Jamaica 2021-12-09 09:57:27 MTR013 repair
6050972 open 2021-10-01 00:00:00 Broadband Jamaica 2021-12-09 09:57:27 MTR013 repair
6052253 closed 2021-10-02 00:00:00 Broadband Jamaica 2021-12-09 09:57:27 MTR013 repair
6053697 open 2021-10-03 00:00:00 Broadband Jamaica 2021-12-09 09:57:27 MTR013 repair
**EXPECTED OUTPUT FORMAT** SAMPLE
country product load_time metric_id name ticket_time Value(count(id)with status closed)
Antigua TV 2021-12-09 09:57:27 MTR013 pending_repair 2021-10-01 1
.... ... .... ... ... ... 2
我嘗試了以下代碼:
df = new_df[new_df['status'] == 'closed'].groupby(['country', 'product']).agg(Value = pd.NamedAgg(column='id', aggfunc="size"))
df.reset_index(inplace=True)
但它只返回三列國家、產品和價值。
我需要我在上面的 EXPECTED OUTPUT FORMAT 中提到的其余列。 另外,我試過
df1 = new_df[new_df['status'] == 'closed']
df1['Value'] = df1.groupby(['country', 'product'])['status'].transform('size')
df = df1.drop_duplicates(['country', 'product']).drop('status', axis=1)
Output
id ticket_time product country load_time metric_id name Value
3762949 2021-10-01 Fixed Voice St Lucia 2021-12-09 09:57:27 MTR013 pending_repair 23
3766608 2021-10-04 Broadband St Lucia 2021-12-09 09:57:27 MTR013 pending_repair 87
帶有轉換返回 id 列的第二個邏輯,這是我不想要的。 值列基於關閉狀態的計數(id)。 我嘗試了上述兩種方法,但無法得到預期的 output。 有沒有辦法處理這個?
當您分組時,通常是根據某個類別匯總數據,因此您不會保留所有單獨的記錄,而只會留下您分組的列和列匯總數據(計數、平均值等)。 然而,變換 function 會做你想做的事。 我認為這就是您根據您的預期 OUTPUT 尋找的東西。
df_closed = df[df['status']=='closed'] # Filters data
df_closed = df_closed.reindex() # Resets index
df_closed['count_closed'] = df_closed.groupby('country')['status'].transform(len)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.