将函数应用于跨列的 Pandas DataFrame 以创建用于排序的临时列

Question

我想使用.assign()方法将log()等函数应用于数据框以创建临时列并将其用作排序标准，但是，我无法像它的工作方式一样传递轴参数对于.apply()方法。

这是一个示例代码：

from numpy.random import randint

set.seed(0)
df = pd.DataFrame({'value':[randint(1,10) for i in range(0,10)], 'reading': [randint(1,10) for i in range(0,10)]})

   value  reading
0      8        6
1      5        9
2      3        7
3      8        2
4      6        1
5      4        9
6      6        2
7      3        5
8      2        2
9      8        8

我不能像这样使用 .assign() 方法：

df.assign(log = log(df.value/df.reading))

    raise TypeError("cannot convert the series to " "{0}".format(str(converter)))
TypeError: cannot convert the series to <class 'float'>

或者

df.assign(log = lambda x: log(x.value/x.reading))

    raise TypeError("cannot convert the series to " "{0}".format(str(converter)))
TypeError: cannot convert the series to <class 'float'>

但它适用于 .apply() 方法：

df.apply(lambda x: log(x.value/x.reading), axis=1)

0    0.287682
1   -0.587787
2   -0.847298
3    1.386294
4    1.791759
5   -0.810930
6    1.098612
7   -0.510826
8    0.000000
9    0.000000
dtype: float64

任何使用分配或不同方法将其用作排序中的临时列的解决方法？

Answer 1

您应该尽可能多地使用矢量化函数，并保留apply(..., axis=1)作为最后的手段，当您必须逐行执行操作时。

你的问题可以用np.log解决，它是矢量化的：

df.assign(log=lambda x: np.log(x['value'] / x['reading']))

如果您有自定义函数，最好使用来自numpy或scipy矢量化函数重写它。 作为最后的手段，您可以使用np.vectorize ：

import math
def my_custom_func(x):
    return math.log(x)

f = np.vectorize(my_custom_func)
df.assign(log2=lambda x: f(x['value'] / x['reading']))

将函数应用于跨列的 Pandas DataFrame 以创建用于排序的临时列

问题描述

1 个解决方案

解决方案1
3 已采纳 2020-01-01 16:14:57

将函数应用于跨列的 Pandas DataFrame 以创建用于排序的临时列

问题描述

1 个解决方案

解决方案1 3 已采纳 2020-01-01 16:14:57

解决方案1
3 已采纳 2020-01-01 16:14:57