[英]Find the minimum value in a Pandas dataframe and add a label on new column
What improvements can I make to my python pandas code to make it more efficient?我可以对我的 python pandas 代码进行哪些改进以提高效率? For my case, I have this dataframe就我而言,我有这个 dataframe
In [1]: df = pd.DataFrame({'PersonID': [1, 1, 1, 2, 2, 2, 3, 3, 3],
'Name': ["Jan", "Jan", "Jan", "Don", "Don", "Don", "Joe", "Joe", "Joe"],
'Label': ["REL", "REL", "REL", "REL", "REL", "REL", "REL", "REL", "REL"],
'RuleID': [55, 55, 55, 3, 3, 3, 10, 10, 10],
'RuleNumber': [3, 4, 5, 1, 2, 3, 234, 567, 999]})
Which gives this result:这给出了这个结果:
In [2]: df
Out[2]:
PersonID Name Label RuleID RuleNumber
0 1 Jan REL 55 3
1 1 Jan REL 55 4
2 1 Jan REL 55 5
3 2 Don REL 3 1
4 2 Don REL 3 2
5 2 Don REL 3 3
6 3 Joe REL 10 234
7 3 Joe REL 10 567
8 3 Joe REL 10 999
What I need to accomplished here is to update the fields under the Label column to MAIN for the lowest rule value associated with each Rule ID that is applied to a Person ID and Name.我需要在这里完成的是将 Label 列下的字段更新为 MAIN,以获取与应用于人员 ID 和名称的每个规则 ID 关联的最低规则值。 Therefore, the results need to look like this:因此,结果需要如下所示:
In [3]: df
Out[3]:
PersonID Name Label RuleID RuleNumber
0 1 Jan MAIN 55 3
1 1 Jan REL 55 4
2 1 Jan REL 55 5
3 2 Don MAIN 3 1
4 2 Don REL 3 2
5 2 Don REL 3 3
6 3 Joe MAIN 10 234
7 3 Joe REL 10 567
8 3 Joe REL 10 999
This is the code that I wrote to accomplish this:这是我为实现此目的而编写的代码:
In [4]:
df['Label'] = np.where(
df['RuleNumber'] ==
df.groupby(['PersonID', 'Name', 'RuleID'])['RuleNumber'].transform('min'),
"MAIN", df.Label)
Is there a better way to update the values under the Label column?有没有更好的方法来更新 Label 列下的值? I feel like I'm brute forcing my way through and this may not be the most efficient way to do this.我觉得我是蛮横的,这可能不是最有效的方法。
I used the following SO threads to arrive at my result:我使用以下 SO 线程得出我的结果:
Replace column values within a groupby and condition 替换 groupby 和条件中的列值
Replace values within a groupby based on multiple conditions 根据多个条件替换 groupby 中的值
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.idxmin.html https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.idxmin.html
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.transform.html https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.transform.html
Using Pandas to Find Minimum Values of Grouped Rows 使用 Pandas 查找分组行的最小值
Any advice would be appreciated.任何意见,将不胜感激。
Thank you.谢谢你。
It seems like you can filter by the grouped idxmin
regardless of sorted order and update RuleNumber
based on that.似乎您可以按分组的idxmin
进行过滤,而不管排序顺序如何,并以此为基础更新RuleNumber
。 You can use loc
, np.where
, mask
, or where
as follows:您可以使用loc
、 np.where
、 mask
或where
,如下所示:
df.loc[df.groupby(['PersonID', 'Name', 'RuleID'])['RuleNumber'].idxmin(), 'Label'] = 'MAIN'
OR with np.where
as you were trying:或与np.where
一起尝试:
df['Label'] = (np.where((df.index == df.groupby(['PersonID', 'Name', 'RuleID'])
['RuleNumber'].transform('idxmin')), 'MAIN', 'REL'))
df
Out[1]:
PersonID Name Label RuleID RuleNumber
0 1 Jan MAIN 55 3
1 1 Jan REL 55 4
2 1 Jan REL 55 5
3 2 Don MAIN 3 1
4 2 Don REL 3 2
5 2 Don REL 3 3
6 3 Joe MAIN 10 234
7 3 Joe REL 10 567
8 3 Joe REL 10 999
Using mask
or its inverse where
would also work:使用mask
或其反函数where
也可以:
df['Label'] = (df['Label'].mask((df.index == df.groupby(['PersonID', 'Name', 'RuleID'])
['RuleNumber'].transform('idxmin')), 'MAIN'))
OR或者
df['Label'] = (df['Label'].where((df.index != df.groupby(['PersonID', 'Name', 'RuleID'])
['RuleNumber'].transform('idxmin')), 'MAIN'))
import pandas as pd
df = pd.DataFrame({'PersonID': [1, 1, 1, 2, 2, 2, 3, 3, 3],
'Name': ["Jan", "Jan", "Jan", "Don", "Don", "Don", "Joe", "Joe", "Joe"],
'Label': ["REL", "REL", "REL", "REL", "REL", "REL", "REL", "REL", "REL"],
'RuleID': [55, 55, 55, 3, 3, 3, 10, 10, 10],
'RuleNumber': [3, 4, 5, 1, 2, 3, 234, 567, 999]})
df.loc[df.groupby('Name')['RuleNumber'].idxmin()[:], 'Label'] = 'MAIN'
Use duplicated
on PersonID:在 PersonID 上使用duplicated
:
df.loc[~df['PersonID'].duplicated(),'Label'] = 'MAIN'
print(df)
Output: Output:
PersonID Name Label RuleID RuleNumber
0 1 Jan MAIN 55 3
1 1 Jan REL 55 4
2 1 Jan REL 55 5
3 2 Don MAIN 3 1
4 2 Don REL 3 2
5 2 Don REL 3 3
6 3 Joe MAIN 10 234
7 3 Joe REL 10 567
8 3 Joe REL 10 999
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.