简体   繁体   中英

Add a new column with lowest value within groups

I have a dataframe such as:

Groups Value Element 
G1     1     A
G1     4     B
G1     6     C
G2     2     D
G2     1     E
G3     7     F
G3     4     G
G3     2     H
G3     2     I 

And I would like to add a newcolumn called first_Element which would be for each Groups the Element with the lowest Value , if there are ex aequo, then take the first one.

I should then get:

Groups Value Element first_Element
G1     1     A       A
G1     4     B       A
G1     6     C       A
G2     2     D       E
G2     1     E       E
G3     7     F       H
G3     4     G       H
G3     2     H       H
G3     2     I       H

Does some one have an idea please?

df = df.merge(pd.DataFrame(df.groupby('Groups').apply(lambda x: x['Element'][x['Value'].idxmin()]), columns=['first_Element']).reset_index(), on='Groups')

Output:

>>> df
  Groups  Value Element first_Element
0     G1      1       A             A
1     G1      4       B             A
2     G1      6       C             A
3     G2      2       D             E
4     G2      1       E             E
5     G3      7       F             H
6     G3      4       G             H
7     G3      2       H             H
8     G3      2       I             H

User groupby().transform with idxmin , then loc accesss:

min_loc = df.groupby('Groups')['Value'].transform('idxmin')

df['first_element'] = df.loc[min_loc, 'Element'].to_numpy()

Output:

  Groups  Value Element first_element
0     G1      1       A             A
1     G1      4       B             A
2     G1      6       C             A
3     G2      2       D             E
4     G2      1       E             E
5     G3      7       F             H
6     G3      4       G             H
7     G3      2       H             H
8     G3      2       I             H

Here is a way using map:

(df.assign(first_Element = df['Groups'].map(df.loc[df.groupby('Groups')['Value'].idxmin()]
                                            .set_index('Groups')['Element'])))

One option is to sort the values, group, then select the first values per group:

first = (df.sort_values(['Groups', 'Value'])
           .groupby('Groups', sort = False)
           .Element
           .transform('first')
         )
df.assign(first_Element = first)
 
  Groups  Value Element first_Element
0     G1      1       A             A
1     G1      4       B             A
2     G1      6       C             A
3     G2      2       D             E
4     G2      1       E             E
5     G3      7       F             H
6     G3      4       G             H
7     G3      2       H             H
8     G3      2       I             H

Another option is to sort the values, drop duplicates and merge back to the original dataframe; this avoids a groupby, and might be more efficient (just an assumption):

trimmed = (df.sort_values(['Groups', 'Value'])
             .drop(columns='Value')
             .drop_duplicates(subset='Groups')
             .rename(columns={'Element':'first_Element'})
           )

df.merge(trimmed, on='Groups')
 
  Groups  Value Element first_Element
0     G1      1       A             A
1     G1      4       B             A
2     G1      6       C             A
3     G2      2       D             E
4     G2      1       E             E
5     G3      7       F             H
6     G3      4       G             H
7     G3      2       H             H
8     G3      2       I             H

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM