简体   繁体   English

在 Pandas dataframe 中找到最小值并在新列上添加 label

[英]Find the minimum value in a Pandas dataframe and add a label on new column

What improvements can I make to my python pandas code to make it more efficient?我可以对我的 python pandas 代码进行哪些改进以提高效率? For my case, I have this dataframe就我而言,我有这个 dataframe

In [1]: df = pd.DataFrame({'PersonID': [1, 1, 1, 2, 2, 2, 3, 3, 3],
                           'Name': ["Jan", "Jan", "Jan", "Don", "Don", "Don", "Joe", "Joe", "Joe"],
                           'Label': ["REL", "REL", "REL", "REL", "REL", "REL", "REL", "REL", "REL"],
                           'RuleID': [55, 55, 55, 3, 3, 3, 10, 10, 10],
                           'RuleNumber': [3, 4, 5, 1, 2, 3, 234, 567, 999]})

Which gives this result:这给出了这个结果:

In [2]: df
Out[2]: 
   PersonID Name Label  RuleID  RuleNumber
0         1  Jan   REL      55          3
1         1  Jan   REL      55          4
2         1  Jan   REL      55          5
3         2  Don   REL       3          1
4         2  Don   REL       3          2
5         2  Don   REL       3          3
6         3  Joe   REL      10        234
7         3  Joe   REL      10        567
8         3  Joe   REL      10        999

What I need to accomplished here is to update the fields under the Label column to MAIN for the lowest rule value associated with each Rule ID that is applied to a Person ID and Name.我需要在这里完成的是将 Label 列下的字段更新为 MAIN,以获取与应用于人员 ID 和名称的每个规则 ID 关联的最低规则值。 Therefore, the results need to look like this:因此,结果需要如下所示:

In [3]: df
Out[3]:
   PersonID Name Label  RuleID  RuleNumber
0         1  Jan  MAIN      55           3
1         1  Jan   REL      55           4
2         1  Jan   REL      55           5
3         2  Don  MAIN       3           1
4         2  Don   REL       3           2
5         2  Don   REL       3           3
6         3  Joe  MAIN      10         234
7         3  Joe   REL      10         567
8         3  Joe   REL      10         999

This is the code that I wrote to accomplish this:这是我为实现此目的而编写的代码:

In [4]:

df['Label'] = np.where(
        df['RuleNumber'] ==
        df.groupby(['PersonID', 'Name', 'RuleID'])['RuleNumber'].transform('min'),
        "MAIN", df.Label)

Is there a better way to update the values under the Label column?有没有更好的方法来更新 Label 列下的值? I feel like I'm brute forcing my way through and this may not be the most efficient way to do this.我觉得我是蛮横的,这可能不是最有效的方法。

I used the following SO threads to arrive at my result:我使用以下 SO 线程得出我的结果:

Replace column values within a groupby and condition 替换 groupby 和条件中的列值

Replace values within a groupby based on multiple conditions 根据多个条件替换 groupby 中的值

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.idxmin.html https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.idxmin.html

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.transform.html https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.transform.html

Using Pandas to Find Minimum Values of Grouped Rows 使用 Pandas 查找分组行的最小值

Any advice would be appreciated.任何意见,将不胜感激。

Thank you.谢谢你。

It seems like you can filter by the grouped idxmin regardless of sorted order and update RuleNumber based on that.似乎您可以按分组的idxmin进行过滤,而不管排序顺序如何,并以此为基础更新RuleNumber You can use loc , np.where , mask , or where as follows:您可以使用locnp.wheremaskwhere ,如下所示:

df.loc[df.groupby(['PersonID', 'Name', 'RuleID'])['RuleNumber'].idxmin(), 'Label'] = 'MAIN'

OR with np.where as you were trying:或与np.where一起尝试:

df['Label'] = (np.where((df.index == df.groupby(['PersonID', 'Name', 'RuleID'])
                         ['RuleNumber'].transform('idxmin')), 'MAIN', 'REL'))
df
Out[1]: 
   PersonID Name Label  RuleID  RuleNumber
0         1  Jan  MAIN      55           3
1         1  Jan   REL      55           4
2         1  Jan   REL      55           5
3         2  Don  MAIN       3           1
4         2  Don   REL       3           2
5         2  Don   REL       3           3
6         3  Joe  MAIN      10         234
7         3  Joe   REL      10         567
8         3  Joe   REL      10         999

Using mask or its inverse where would also work:使用mask或其反函数where也可以:

df['Label'] = (df['Label'].mask((df.index == df.groupby(['PersonID', 'Name', 'RuleID'])
                         ['RuleNumber'].transform('idxmin')), 'MAIN'))

OR或者

df['Label'] = (df['Label'].where((df.index != df.groupby(['PersonID', 'Name', 'RuleID'])
                         ['RuleNumber'].transform('idxmin')), 'MAIN'))
import pandas as pd

df = pd.DataFrame({'PersonID': [1, 1, 1, 2, 2, 2, 3, 3, 3],
'Name': ["Jan", "Jan", "Jan", "Don", "Don", "Don", "Joe", "Joe", "Joe"],
'Label': ["REL", "REL", "REL", "REL", "REL", "REL", "REL", "REL", "REL"],
'RuleID': [55, 55, 55, 3, 3, 3, 10, 10, 10],
'RuleNumber': [3, 4, 5, 1, 2, 3, 234, 567, 999]})

df.loc[df.groupby('Name')['RuleNumber'].idxmin()[:], 'Label'] = 'MAIN'

Use duplicated on PersonID:在 PersonID 上使用duplicated

df.loc[~df['PersonID'].duplicated(),'Label'] = 'MAIN'
print(df)

Output: Output:

   PersonID Name Label  RuleID  RuleNumber
0         1  Jan  MAIN      55           3
1         1  Jan   REL      55           4
2         1  Jan   REL      55           5
3         2  Don  MAIN       3           1
4         2  Don   REL       3           2
5         2  Don   REL       3           3
6         3  Joe  MAIN      10         234
7         3  Joe   REL      10         567
8         3  Joe   REL      10         999

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在 dataframe 中查找值并在 pandas 的新列中添加先例列值 - find a value in a dataframe and add precedent column value in a new column in pandas Pandas:在列中查找最小值,将包含该列的行写入新数据帧 - Pandas: Find minimum value in a column, write the row containing that column to a new dataframe 基于标签在同一数据框中的查找值,然后添加到新列(Vlookup) - Lookup value in the same dataframe based on label and add to a new column (Vlookup) 使用正则表达式在不同列的熊猫数据框中查找单词并创建新值 - Find words and create new value in different column pandas dataframe with regex 熊猫:在数据框的最后一行添加一个具有单个值的新列 - Pandas: add a new column with one single value at the last row of a dataframe Pandas - 将特定 iloc 的值添加到新的数据框列中 - Pandas - add value at specific iloc into new dataframe column 将新列添加到 Pandas dataframe,其值来自 function - Add a new column to a Pandas dataframe with a value from a function Pandas:添加新列并按条件从另一个dataframe赋值 - Pandas: Add new column and assigning value from another dataframe by condition Python Pandas dataframe - 根据索引值添加新列 - Python Pandas dataframe - add a new column based on index value Pandas Dataframe - 添加具有另一行值的新列 - Pandas Dataframe - Add a new Column with value from another row
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM