[英]Add a new column with lowest value within groups
I have a dataframe such as:我有一个 dataframe 例如:
Groups Value Element
G1 1 A
G1 4 B
G1 6 C
G2 2 D
G2 1 E
G3 7 F
G3 4 G
G3 2 H
G3 2 I
And I would like to add a newcolumn called first_Element
which would be for each Groups
the Element with the lowest Value
, if there are ex aequo, then take the first one.我想添加一个名为first_Element
的新列,这将是每个Groups
具有最低Value
的元素,如果有 ex aequo,则取第一个。
I should then get:然后我应该得到:
Groups Value Element first_Element
G1 1 A A
G1 4 B A
G1 6 C A
G2 2 D E
G2 1 E E
G3 7 F H
G3 4 G H
G3 2 H H
G3 2 I H
Does some one have an idea please?请问有人有想法吗?
df = df.merge(pd.DataFrame(df.groupby('Groups').apply(lambda x: x['Element'][x['Value'].idxmin()]), columns=['first_Element']).reset_index(), on='Groups')
Output: Output:
>>> df
Groups Value Element first_Element
0 G1 1 A A
1 G1 4 B A
2 G1 6 C A
3 G2 2 D E
4 G2 1 E E
5 G3 7 F H
6 G3 4 G H
7 G3 2 H H
8 G3 2 I H
User groupby().transform
with idxmin
, then loc
accesss:用户groupby().transform
与idxmin
,然后loc
访问:
min_loc = df.groupby('Groups')['Value'].transform('idxmin')
df['first_element'] = df.loc[min_loc, 'Element'].to_numpy()
Output: Output:
Groups Value Element first_element
0 G1 1 A A
1 G1 4 B A
2 G1 6 C A
3 G2 2 D E
4 G2 1 E E
5 G3 7 F H
6 G3 4 G H
7 G3 2 H H
8 G3 2 I H
Here is a way using map:这是使用 map 的方法:
(df.assign(first_Element = df['Groups'].map(df.loc[df.groupby('Groups')['Value'].idxmin()]
.set_index('Groups')['Element'])))
One option is to sort the values, group, then select the first values per group:一种选择是对值进行排序,组,然后 select 每组的第一个值:
first = (df.sort_values(['Groups', 'Value'])
.groupby('Groups', sort = False)
.Element
.transform('first')
)
df.assign(first_Element = first)
Groups Value Element first_Element
0 G1 1 A A
1 G1 4 B A
2 G1 6 C A
3 G2 2 D E
4 G2 1 E E
5 G3 7 F H
6 G3 4 G H
7 G3 2 H H
8 G3 2 I H
Another option is to sort the values, drop duplicates and merge back to the original dataframe;另一种选择是对值进行排序,删除重复项并合并回原始 dataframe; this avoids a groupby, and might be more efficient (just an assumption):这避免了 groupby,并且可能更有效(只是一个假设):
trimmed = (df.sort_values(['Groups', 'Value'])
.drop(columns='Value')
.drop_duplicates(subset='Groups')
.rename(columns={'Element':'first_Element'})
)
df.merge(trimmed, on='Groups')
Groups Value Element first_Element
0 G1 1 A A
1 G1 4 B A
2 G1 6 C A
3 G2 2 D E
4 G2 1 E E
5 G3 7 F H
6 G3 4 G H
7 G3 2 H H
8 G3 2 I H
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.