简体   繁体   中英

Pandas find the maximum in each group that satisfy a condition

Sorry if this has been asked before, could not find an exact question.

I am looking for the most efficient way in Pandas to do the following operation:

Lets say we have the following table:

    ID    SUB_ID    COND

1   101     1        1
2   101     2        1
3   101     3        1
4   102     1        1
5   102     2        0
6   103     1        0
7   103     2        0
8   103     3        0
9   103     4        0

Basically, for each "ID" we want to get the largest "SUB_ID", given that the COND is 1 . Ideally we would want to add this value to each row of that ID as a new column. If no row of that ID fulfills the condition, we would like to add a 0 (instead of null)

Resulting dataframe would be:

    ID    SUB_ID    COND   MAX_SUB_ID

1   101     1        1         3
2   101     2        1         3
3   101     3        1         3
4   102     1        1         1
5   102     2        0         1
6   103     1        0         0
7   103     2        0         0
8   103     3        0         0
9   103     4        0         0

Best way I can come up with right now is selecting only the rows where COND=1, then doing a groupby on this dataframe to get the max sub id, and then joining it back to the main dataframe. After this I can change the null back to 0.

df_true = df[df['COND']==1]
max_subid_true=df_true['SUB_ID'].groupby(df_true['ID']).max()

joined_df = df.merge(pd.DataFrame(max_subid_true),how='left',left_on='ID',right_index=True)
joined_df.loc[pd.isnull(joined_df['SUB_ID_y']),'SUB_ID_y']=0 

Any ideas on doing this differently?

df.assign(MAX_SUB_ID=df.SUB_ID.mul(df.COND).groupby(df.ID).transform('max'))

    ID  SUB_ID  COND  MAX_SUB_ID
1  101       1     1           3
2  101       2     1           3
3  101       3     1           3
4  102       1     1           1
5  102       2     0           1
6  103       1     0           0
7  103       2     0           0
8  103       3     0           0
9  103       4     0           0

caveats

  • assumes SUB_ID is always positive
  • assumes COND is always 1 or 0

alternative (with less caveats)
but less fun

df.assign(MAX_SUB_ID=df.ID.map(df.query('COND == 1').groupby('ID').SUB_ID.max()) \
    .fillna(0).astype(int))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM