繁体   English   中英

Python:Groupby 条件在 pandas dataframe?

[英]Python: Groupby with conditions in pandas dataframe?

我有一个 dataframe 如下所示。

我需要做 groupby(country and product) 并且Value列应该包含count(id) ,其中 status 是关闭的,我需要返回剩余的列。 预期的 output 格式如下。

Sample input

id        status    ticket_time           product      country     last_load_time       metric_id   name
1260057   open      2021-10-04 01:20:00   Broadband    Grenada     2021-12-09 09:57:27  MTR013      repair
2998178   open      2021-10-02 00:00:00   Fixed Voice  Bahamas     2021-12-09 09:57:27  MTR013      repair
3762949   closed    2021-10-01 00:00:00   Fixed Voice  St Lucia    2021-12-09 09:57:27  MTR013      repair
3766608   closed    2021-10-04 00:00:00   Broadband    St Lucia    2021-12-09 09:57:27  MTR013      repair
3767125   closed    2021-10-04 00:00:00   TV           Antigua     2021-12-09 09:57:27  MTR013      repair
6050009   closed    2021-10-01 00:00:00   TV           Jamaica     2021-12-09 09:57:27  MTR013      repair
6050608   open      2021-10-01 00:00:00   Broadband    Jamaica     2021-12-09 09:57:27  MTR013      repair
6050972   open      2021-10-01 00:00:00   Broadband    Jamaica     2021-12-09 09:57:27  MTR013      repair
6052253   closed    2021-10-02 00:00:00   Broadband    Jamaica     2021-12-09 09:57:27  MTR013      repair
6053697   open      2021-10-03 00:00:00   Broadband    Jamaica     2021-12-09 09:57:27  MTR013      repair  

**EXPECTED OUTPUT FORMAT** SAMPLE

country  product    load_time          metric_id     name          ticket_time        Value(count(id)with status closed)
Antigua   TV      2021-12-09 09:57:27   MTR013     pending_repair   2021-10-01         1
....      ...     ....                  ...        ...              ...                2

我尝试了以下代码:

df = new_df[new_df['status'] == 'closed'].groupby(['country', 'product']).agg(Value = pd.NamedAgg(column='id', aggfunc="size"))
df.reset_index(inplace=True)

但它只返回三列国家、产品和价值

我需要我在上面的 EXPECTED OUTPUT FORMAT 中提到的其余列。 另外,我试过

df1 = new_df[new_df['status'] == 'closed']
df1['Value'] = df1.groupby(['country', 'product'])['status'].transform('size')

df = df1.drop_duplicates(['country', 'product']).drop('status', axis=1)

Output

id    ticket_time    product    country     load_time          metric_id    name        Value
3762949 2021-10-01  Fixed Voice St Lucia    2021-12-09 09:57:27 MTR013  pending_repair  23
3766608 2021-10-04  Broadband   St Lucia    2021-12-09 09:57:27 MTR013  pending_repair  87

带有转换返回 id 列的第二个逻辑,这是我不想要的。 值列基于关闭状态的计数(id)。 我尝试了上述两种方法,但无法得到预期的 output。 有没有办法处理这个?

当您分组时,通常是根据某个类别汇总数据,因此您不会保留所有单独的记录,而只会留下您分组的列和列汇总数据(计数、平均值等)。 然而,变换 function 会做你想做的事。 我认为这就是您根据您的预期 OUTPUT 寻找的东西。

df_closed = df[df['status']=='closed']  # Filters data

df_closed = df_closed.reindex()  # Resets index

df_closed['count_closed'] = df_closed.groupby('country')['status'].transform(len)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM