简体   繁体   English

对 Pandas 中的多列进行分组和计数

[英]grouping and counting multiple columns in Pandas

I have the below df and I'm trying to group it by JOB_STREAM_NAME and JOB_NAME and count how many INs happened for this pair我有以下 df,我正在尝试按 JOB_STREAM_NAME 和 JOB_NAME 对其进行分组,并计算这对发生了多少 IN

df去向

    JOB_STREAM_NAME         JOB_NAME            IN          Start_Time              Description                                         
0   P26_NEXT_NBA_DES    PP_NEXT_NBA_AS01A0001_D NaN         NaT                     NaN                                                 
1   P26_NEXT_NBA_TMP    PP_NEXT_NBA_AS01A0001_D NaN         NaT                     NaN                                                 
2   P26_NEXT_NBA_TOD    PP_NEXT_NBA_AS01A0001_D IN7395593   2022-08-13 12:38:39     UT20 O.T.A. >>> ABEND IN JOB PP_NEXT_NBA_AS01A...   
3   P26_NEXT_NBA_TOD    PP_NEXT_NBA_AS01A0001_D IN7420940   2022-08-19 14:33:32     UT20 O.T.A. >>> ABEND IN JOB PP_NEXT_NBA_AS01A...   
4   P26_AAAR_006_TSA    PP_AAAR_4898_DAVMOV_D   IN7444113   2022-08-25 08:06:10     UT20 O.T.A. >>> ABEND IN JOB PP_AAAR_4898_DAVMOV_D...
5   P26_AAAR_006_TSA    PP_AAAR_4898_DAVMOV_D   IN7395596   2022-08-13 12:39:06     UT20 O.T.A. >>> ABEND IN JOB PP_AAAR_4898_DAVMOV_D...

my desired output should be this:我想要的 output 应该是这样的:

df_2
    JOB_STREAM_NAME         JOB_NAME            IN          Qt_INs  Start_Time              Description                                         
0   P26_NEXT_NBA_DES    PP_NEXT_NBA_AS01A0001_D NaN         0       NaT                     NaN                                                 
1   P26_NEXT_NBA_TMP    PP_NEXT_NBA_AS01A0001_D NaN         0       NaT                     NaN                                                 
2   P26_NEXT_NBA_TOD    PP_NEXT_NBA_AS01A0001_D IN7395593   2       2022-08-13 12:38:39     UT20 O.T.A. >>> ABEND IN JOB PP_NEXT_NBA_AS01A...   
3   P26_NEXT_NBA_TOD    PP_NEXT_NBA_AS01A0001_D IN7420940   2       2022-08-19 14:33:32     UT20 O.T.A. >>> ABEND IN JOB PP_NEXT_NBA_AS01A...   
4   P26_AAAR_006_TSA    PP_AAAR_4898_DAVMOV_D   IN7444113   2       2022-08-25 08:06:10     UT20 O.T.A. >>> ABEND IN JOB PP_AAAR_4898_DAVMOV_D...
5   P26_AAAR_006_TSA    PP_AAAR_4898_DAVMOV_D   IN7395596   2       2022-08-13 12:39:06     UT20 O.T.A. >>> ABEND IN JOB PP_AAAR_4898_DAVMOV_D...

I tried somethings like我试过类似的东西

df["Qt_INs"] =  df.groupby(["JOB_STREAM_NAME","JOB_NAME"]).count()
df["Qt_INs"] =  df.groupby(["JOB_STREAM_NAME","JOB_NAME"])["IN"].nunique()

None worked ans intended没有按预期工作

could you guys help me?你们能帮帮我吗?

You need to use transform if your output is supposed to look as in the above:如果您的 output 应该如上所示,则需要使用转换:

df["Qt_INs"] =  df.groupby(["JOB_STREAM_NAME","JOB_NAME"])["IN"].transform("count")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM