定义一个函数使用其他函数名作为参数

Question

我有一个DataFrame如下所示：

df = {'col_1': [1,2,3,4,5,6,7,8,9,10],
      'col_2': [1,2,3,4,5,6,7,8,9,10],
      'col_3':['A','A','A','A','A','B','B','B','B','B']}
df = pd.DataFrame(df)

虽然我使用的真实数据有数百列，但我想使用不同的函数来操作这些列，例如min ， max以及自定义函数，例如：

def dist(x):
    return max(x) - min(x)
def HHI(x):
    ss = sum([s**2 for s in x])
    return ss

我想要的功能如下：

def myfunc(cols,fun):
    return df.groupby('col_3')[[cols]].transform(lambda x: fun)
# which allow me to do something like:

df[['min_' + s for s in cols]] = myfunc(cols, min)
df[['max_' + s for s in cols]] = myfunc(cols, max)
df[['dist_' + s for s in cols]] = myfunc(cols, dist)

这在Python中是否可行（我的猜测是'是'）？
那如果是的话呢？

编辑======关于自我定义功能的名称=======
根据jpp的解决方案，我所问的是可能的，至少对于bulit-in函数，更多的工作需要考虑自定义函数。

一个可行的解决方案，

temp = df.copy()
for func in ['HHI','DIST'] :
    print(func)
    temp[[ func + s for s in cols]] = df.pipe(myfunc,cols,eval(func))

这里的关键是使用eval tunction将字符串表达式转换为函数。 但是，可能有更好的方法来做到这一点，期待看到。

编辑======每个jpp关于自定义函数名称的评论=======

jpp的注释直接将函数名称提供给myfun是基于我的测试有效，但是，基于func新列名称将是这样的： <function HHI at 0x00000194460019D8> ，这是不可读的，修改是temp[[ str(func.__name__) + s for s in cols]] ，希望这能帮助那些后来遇到这个问题的人。

Answer 1

这是使用pd.DataFrame.pipe的一种方法。

使用Python，一切都是一个对象，可以在没有类型检查的情况下传递。 其理念是“不要检查它是否有效，只需尝试......”。 因此，您可以将字符串或函数传递给myfunc及其上进行transform而不会产生任何有害的副作用。

def myfunc(df, cols, fun):
    return df.groupby('col_3')[cols].transform(fun)

cols = ['col_1', 'col_2']

df[[f'min_{s}' for s in cols]] = df.pipe(myfunc, cols, 'min')
df[[f'max_{s}' for s in cols]] = df.pipe(myfunc, cols, 'max')
df[[f'dist_{s}' s in cols]] = df.pipe(myfunc, cols, lambda x: x.max() - x.min())

结果：

print(df)

   col_1  col_2 col_3  min_col_1  min_col_2  max_col_1  max_col_2  dist_col_1  \
0      1      1     A          1          1          5          5           4   
1      2      2     A          1          1          5          5           4   
2      3      3     A          1          1          5          5           4   
3      4      4     A          1          1          5          5           4   
4      5      5     A          1          1          5          5           4   
5      6      6     B          6          6         10         10           4   
6      7      7     B          6          6         10         10           4   
7      8      8     B          6          6         10         10           4   
8      9      9     B          6          6         10         10           4   
9     10     10     B          6          6         10         10           4   

   dist_col_2  
0           4  
1           4  
2           4  
3           4  
4           4  
5           4  
6           4  
7           4  
8           4  
9           4

Answer 2

是的，你非常接近：

def myfunc(cols,fun):
    return df.groupby('col_3')[cols].transform(lambda x: fun(x))

要么：

def myfunc(cols,fun):
    return df.groupby('col_3')[cols].transform(fun)

定义一个函数使用其他函数名作为参数

问题描述

2 个解决方案

解决方案1
4 已采纳 2018-10-06 19:34:11

解决方案2
3 2018-10-06 19:34:27

定义一个函数使用其他函数名作为参数

问题描述

2 个解决方案

解决方案1 4 已采纳 2018-10-06 19:34:11

解决方案2 3 2018-10-06 19:34:27

解决方案1
4 已采纳 2018-10-06 19:34:11

解决方案2
3 2018-10-06 19:34:27