簡體   English   中英

Python:Groupby 條件在 pandas dataframe?

[英]Python: Groupby with conditions in pandas dataframe?

我有一個 dataframe 如下所示。

我需要做 groupby(country and product) 並且Value列應該包含count(id) ,其中 status 是關閉的,我需要返回剩余的列。 預期的 output 格式如下。

Sample input

id        status    ticket_time           product      country     last_load_time       metric_id   name
1260057   open      2021-10-04 01:20:00   Broadband    Grenada     2021-12-09 09:57:27  MTR013      repair
2998178   open      2021-10-02 00:00:00   Fixed Voice  Bahamas     2021-12-09 09:57:27  MTR013      repair
3762949   closed    2021-10-01 00:00:00   Fixed Voice  St Lucia    2021-12-09 09:57:27  MTR013      repair
3766608   closed    2021-10-04 00:00:00   Broadband    St Lucia    2021-12-09 09:57:27  MTR013      repair
3767125   closed    2021-10-04 00:00:00   TV           Antigua     2021-12-09 09:57:27  MTR013      repair
6050009   closed    2021-10-01 00:00:00   TV           Jamaica     2021-12-09 09:57:27  MTR013      repair
6050608   open      2021-10-01 00:00:00   Broadband    Jamaica     2021-12-09 09:57:27  MTR013      repair
6050972   open      2021-10-01 00:00:00   Broadband    Jamaica     2021-12-09 09:57:27  MTR013      repair
6052253   closed    2021-10-02 00:00:00   Broadband    Jamaica     2021-12-09 09:57:27  MTR013      repair
6053697   open      2021-10-03 00:00:00   Broadband    Jamaica     2021-12-09 09:57:27  MTR013      repair  

**EXPECTED OUTPUT FORMAT** SAMPLE

country  product    load_time          metric_id     name          ticket_time        Value(count(id)with status closed)
Antigua   TV      2021-12-09 09:57:27   MTR013     pending_repair   2021-10-01         1
....      ...     ....                  ...        ...              ...                2

我嘗試了以下代碼:

df = new_df[new_df['status'] == 'closed'].groupby(['country', 'product']).agg(Value = pd.NamedAgg(column='id', aggfunc="size"))
df.reset_index(inplace=True)

但它只返回三列國家、產品和價值

我需要我在上面的 EXPECTED OUTPUT FORMAT 中提到的其余列。 另外,我試過

df1 = new_df[new_df['status'] == 'closed']
df1['Value'] = df1.groupby(['country', 'product'])['status'].transform('size')

df = df1.drop_duplicates(['country', 'product']).drop('status', axis=1)

Output

id    ticket_time    product    country     load_time          metric_id    name        Value
3762949 2021-10-01  Fixed Voice St Lucia    2021-12-09 09:57:27 MTR013  pending_repair  23
3766608 2021-10-04  Broadband   St Lucia    2021-12-09 09:57:27 MTR013  pending_repair  87

帶有轉換返回 id 列的第二個邏輯,這是我不想要的。 值列基於關閉狀態的計數(id)。 我嘗試了上述兩種方法,但無法得到預期的 output。 有沒有辦法處理這個?

當您分組時,通常是根據某個類別匯總數據,因此您不會保留所有單獨的記錄,而只會留下您分組的列和列匯總數據(計數、平均值等)。 然而,變換 function 會做你想做的事。 我認為這就是您根據您的預期 OUTPUT 尋找的東西。

df_closed = df[df['status']=='closed']  # Filters data

df_closed = df_closed.reindex()  # Resets index

df_closed['count_closed'] = df_closed.groupby('country')['status'].transform(len)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM