简体   繁体   English

在组中添加一个具有最低值的新列

[英]Add a new column with lowest value within groups

I have a dataframe such as:我有一个 dataframe 例如:

Groups Value Element 
G1     1     A
G1     4     B
G1     6     C
G2     2     D
G2     1     E
G3     7     F
G3     4     G
G3     2     H
G3     2     I 

And I would like to add a newcolumn called first_Element which would be for each Groups the Element with the lowest Value , if there are ex aequo, then take the first one.我想添加一个名为first_Element的新列,这将是每个Groups具有最低Value元素,如果有 ex aequo,则取第一个。

I should then get:然后我应该得到:

Groups Value Element first_Element
G1     1     A       A
G1     4     B       A
G1     6     C       A
G2     2     D       E
G2     1     E       E
G3     7     F       H
G3     4     G       H
G3     2     H       H
G3     2     I       H

Does some one have an idea please?请问有人有想法吗?

df = df.merge(pd.DataFrame(df.groupby('Groups').apply(lambda x: x['Element'][x['Value'].idxmin()]), columns=['first_Element']).reset_index(), on='Groups')

Output: Output:

>>> df
  Groups  Value Element first_Element
0     G1      1       A             A
1     G1      4       B             A
2     G1      6       C             A
3     G2      2       D             E
4     G2      1       E             E
5     G3      7       F             H
6     G3      4       G             H
7     G3      2       H             H
8     G3      2       I             H

User groupby().transform with idxmin , then loc accesss:用户groupby().transformidxmin ,然后loc访问:

min_loc = df.groupby('Groups')['Value'].transform('idxmin')

df['first_element'] = df.loc[min_loc, 'Element'].to_numpy()

Output: Output:

  Groups  Value Element first_element
0     G1      1       A             A
1     G1      4       B             A
2     G1      6       C             A
3     G2      2       D             E
4     G2      1       E             E
5     G3      7       F             H
6     G3      4       G             H
7     G3      2       H             H
8     G3      2       I             H

Here is a way using map:这是使用 map 的方法:

(df.assign(first_Element = df['Groups'].map(df.loc[df.groupby('Groups')['Value'].idxmin()]
                                            .set_index('Groups')['Element'])))

One option is to sort the values, group, then select the first values per group:一种选择是对值进行排序,组,然后 select 每组的第一个值:

first = (df.sort_values(['Groups', 'Value'])
           .groupby('Groups', sort = False)
           .Element
           .transform('first')
         )
df.assign(first_Element = first)
 
  Groups  Value Element first_Element
0     G1      1       A             A
1     G1      4       B             A
2     G1      6       C             A
3     G2      2       D             E
4     G2      1       E             E
5     G3      7       F             H
6     G3      4       G             H
7     G3      2       H             H
8     G3      2       I             H

Another option is to sort the values, drop duplicates and merge back to the original dataframe;另一种选择是对值进行排序,删除重复项并合并回原始 dataframe; this avoids a groupby, and might be more efficient (just an assumption):这避免了 groupby,并且可能更有效(只是一个假设):

trimmed = (df.sort_values(['Groups', 'Value'])
             .drop(columns='Value')
             .drop_duplicates(subset='Groups')
             .rename(columns={'Element':'first_Element'})
           )

df.merge(trimmed, on='Groups')
 
  Groups  Value Element first_Element
0     G1      1       A             A
1     G1      4       B             A
2     G1      6       C             A
3     G2      2       D             E
4     G2      1       E             E
5     G3      7       F             H
6     G3      4       G             H
7     G3      2       H             H
8     G3      2       I             H

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 添加一个新列,其中值总和除以 python 中组内的唯一值 - Add a new column with sum of values divided by unique value within groups in python 向 pandas dataframe 添加新列,并在组内增加日期 - Add a new column to pandas dataframe with increment dates within groups 将新列中 boolean 值的数量(基于不同列中的值组)相加 - Sum the amount of boolean values (based on value groups within different column) inside a new column Pandas 评分 n. 将每个日期的最低值放入新列 - Pandas scoring n.lowest value of each date into a new column 使用新的最大值对列中的值组重新排序 - Reorder groups of values in the column with a new maximum value 如何按排序后的列值将列分为5组,然后添加列的 - How to divide a column into 5 groups by the column value sorted, and then add column's 在 pandas 中添加包含组内重复值数量和组内唯一值数量的列 - Add column with number of duplicated values within groups and number of unique values within groups in pandas 当特定单词是列内列表中的值时,如何将特定单词添加到新列 - How to add a specific word to a new column when it is a value in a list within a column 如何为组内的每个值制作一列中位数? - How to make a column of median for each value within groups? 添加包含一个值的新列 - Add new column with one value
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM