简体   繁体   中英

Pandas Adding new column from result of groupby

Have the below dataframe and would like to add a new column indicating the value associated with max from the groupby operation. So my DF would have many such records and I just want to loop through each group by similar to df.groupby('B')['C'].mean().idxmax() but go through entire DF and then create a new col with result as below:

           A     B   C    D
    0   05641  KMPV  45  056
    1   05649  KMPV  45  056
    2   05650  KMPV  45  056
    3   05651  KMPV  45  056
    4   05653  KMPV  45  056
    5   05678  KMPV  45  056
    6   05654  KMPV  45  056
    7   05609  KMPV  45  056
    8   05620  KMPV  45  056
    9   05633  KMPV  45  056
    10  05601  KMPV  45  056
    11  05602  KMPV  45  056
    12  05603  KMPV  45  056
    13  05662  KMVL  23  056
    14  05665  KMVL  23  056
    15  05672  KMVL  23  056
    16  05677  KMVL  23  056
    17  05648  KMVL  23  056
    18  05652  KMVL  23  056
    19  05680  KMVL  23  056
    20  05655  KMVL  23  056
    21  05656  KMVL  23  056
    22  05657  KMVL  23  056
    23  05661  KMVL  23  056
    24  05681  KMVL  23  056
    25  05682  KMVL  23  056
    26  05647  K1V4  30  056
    27  05658  K1V4  30  056

New DataFrame with column indicating that KMPV has highest value.

            A     B   C    D newval
    0   05641  KMPV  45  056   KMPV
    1   05649  KMPV  45  056   KMPV
    2   05650  KMPV  45  056   KMPV
    3   05651  KMPV  45  056   KMPV
    4   05653  KMPV  45  056   KMPV
    5   05678  KMPV  45  056   KMPV
    6   05654  KMPV  45  056   KMPV
    7   05609  KMPV  45  056   KMPV
    8   05620  KMPV  45  056   KMPV
    9   05633  KMPV  45  056   KMPV
    10  05601  KMPV  45  056   KMPV
    11  05602  KMPV  45  056   KMPV
    12  05603  KMPV  45  056   KMPV
    13  05662  KMVL  23  056   KMPV
    14  05665  KMVL  23  056   KMPV
    15  05672  KMVL  23  056   KMPV
    16  05677  KMVL  23  056   KMPV
    17  05648  KMVL  23  056   KMPV
    18  05652  KMVL  23  056   KMPV
    19  05680  KMVL  23  056   KMPV
    20  05655  KMVL  23  056   KMPV
    21  05656  KMVL  23  056   KMPV
    22  05657  KMVL  23  056   KMPV
    23  05661  KMVL  23  056   KMPV
    24  05681  KMVL  23  056   KMPV
    25  05682  KMVL  23  056   KMPV
    26  05647  K1V4  30  056   KMPV
    27  05658  K1V4  30  056   KMPV

Putting it more simply. How do I select the max from the below groupby? In other words I want to filter to 056 -> KMPV and 100 -> KJRB since they have the highest numbers and so forth.

     A               B
    056            K1V4        30
                   KMPV        45
                   KMVL        23
    100            KJRB       130
                   KNYC        84
                   KTEB        80

Quick and dirty.

Data:

{'A': {0: 5641, 1: 5649, 2: 5650, 3: 5651, 4: 5653, 5: 5678, 6: 5654, 7: 5609, 8: 5620, 9: 5633, 10: 5601, 11: 5602, 12: 5603, 13: 5662, 14: 5665, 15: 5672, 16: 5677, 17: 5648, 18: 5652, 19: 5680, 20: 10155, 21: 10156, 22: 10157, 23: 10161, 24: 10181, 25: 10182, 26: 10147, 27: 10158}, 'C': {0: 45, 1: 45, 2: 45, 3: 45, 4: 45, 5: 45, 6: 45, 7: 45, 8: 45, 9: 45, 10: 45, 11: 45, 12: 45, 13: 23, 14: 23, 15: 23, 16: 23, 17: 23, 18: 23, 19: 23, 20: 23, 21: 23, 22: 23, 23: 23, 24: 23, 25: 23, 26: 30, 27: 30}, 'B': {0: 'KMPV', 1: 'KMPV', 2: 'KMPV', 3: 'KMPV', 4: 'KMPV', 5: 'KMPV', 6: 'KMPV', 7: 'KMPV', 8: 'KMPV', 9: 'KMPV', 10: 'KMPV', 11: 'KMPV', 12: 'KMPV', 13: 'KMVL', 14: 'KMVL', 15: 'KMVL', 16: 'KMVL', 17: 'KMVL', 18: 'KMVL', 19: 'KMVL', 20: 'KMVL', 21: 'KMVL', 22: 'KMVL', 23: 'KMVL', 24: 'KMVL', 25: 'KMVL', 26: 'K1V4', 27: 'K1V4'}, 'D': {0: 56, 1: 56, 2: 56, 3: 56, 4: 56, 5: 56, 6: 56, 7: 56, 8: 56, 9: 56, 10: 56, 11: 56, 12: 56, 13: 56, 14: 56, 15: 56, 16: 56, 17: 56, 18: 56, 19: 56, 20: 101, 21: 101, 22: 101, 23: 101, 24: 101, 25: 101, 26: 101, 27: 101}}

Code:

import pandas as pd

df = pd.read_csv('so.csv')
df_ = df.groupby(['D'])
# Result of above line is a tuple.
for g, v in df_:
    # Get max C of each group. Then get column B.
    x = v[v['C'] == v['C'].max()]['B'].iat[0]
    df.loc[df['D'] == g,'newval'] = x
print df

Result:

        A     B   C    D newval
0    5641  KMPV  45   56   KMPV
...
13   5662  KMVL  23   56   KMPV
...
20  10155  KMVL  23  101   K1V4
...
26  10147  K1V4  30  101   K1V4
27  10158  K1V4  30  101   K1V4

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM