简体   繁体   English

如何通过分组将功能应用于熊猫数据框

[英]How to apply function to panda dataframe with group by

I have this function i found on git hub. 我有在git hub上找到的此功能。

def std_div(data, threshold=3):
    std = data.std()
    mean = data.mean()
    isOutlier = []
    for val in data:
        if val/std > threshold:
            isOutlier.append(True)
        else:
            isOutlier.append(False)
    return isOutlier

I want to apply this to my dataFrame for each group(dept) 我想将此应用于每个组的我的dataFrame(部门)

     employee_id  dept            Salary
      1             sales           10000
      2             sales           110000 
      3             sales           120000
      4             hr              5000
      5             hr              6000 

This works, but it calculates the std div for the entire data frame. 这可行,但它会为整个数据帧计算std div。

df["std_div"]= df.from_dict(std_div(df.Salary))

You could do something along the lines of the following, where you group by the column of interest then use a for loop to run the function on the column for that specific group 您可以按照以下方式进行操作,即按感兴趣的列分组,然后使用for循环在该特定组的列上运行函数

for name, group in df.groupby('dept'):
    df.loc[group.index, 'outlier'] = std_div(group.Salary)

df
employee_id dept    Salary  outlier
1           sales   10000   False
2           sales   110000  False
3           sales   120000  False
4           hr      5000    True
5           hr      6000    True

Depending on what you would like that output to be, you can assign the return values to the original dataframe 根据您希望输出的结果,可以将返回值分配给原始数据帧

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在Python中的熊猫中对指定时间段内的日期时间进行分组并将聚合function应用于日期时间? - How to group by and apply aggregate function to datetime for a specified period in panda in Python? 如何将“tzoffset”应用于熊猫数据框中的日期时间对象? - How to apply a 'tzoffset' to a datetime object in a panda dataframe? 使用apply将熊猫的数据框组迭代转换为groupby - Converting panda's dataframe group iteration into groupby with apply 将公式应用于熊猫 Dataframe - Apply a formula to a panda Dataframe 如何从熊猫数据框返回单个组 - How to return a single group from a panda dataframe 如何在使用 lambda 的计算函数中输入不断变化的列数并应用于熊猫的数据帧? - How to enter a changing number of columns in a function of calculation using lambda and apply in a panda's dataframe? 在熊猫 dataframe 中拆分和分组 - split and group in panda dataframe 如何在熊猫数据框中实现像sumifs函数 - how to achieve like sumifs function in panda dataframe 如何将映射器应用于熊猫 dataframe 列的特定部分? - How to apply a mapper to a specific portion of a column of a panda dataframe? 如何在最小和最大尺寸条件下应用熊猫组(Pythonic方式) - How to apply panda group by with minimum and maximum size condition (Pythonic way)
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM