简体   繁体   English

如何在 dataframe 的每一行上应用 function?

[英]How to apply a function on every row on a dataframe?

I am new to Python and I am not sure how to solve the following problem.我是 Python 的新手,我不确定如何解决以下问题。

I have a function:我有一个 function:

def EOQ(D,p,ck,ch):
    Q = math.sqrt((2*D*ck)/(ch*p))
    return Q

Say I have the dataframe假设我有 dataframe

df = pd.DataFrame({"D": [10,20,30], "p": [20, 30, 10]})

    D   p
0   10  20
1   20  30
2   30  10

ch=0.2
ck=5

And ch and ck are float types.chck是 float 类型。 Now I want to apply the formula to every row on the dataframe and return it as an extra row 'Q'.现在我想将公式应用于 dataframe 的每一行并将其作为额外的行“Q”返回。 An example (that does not work) would be:一个例子(不起作用)是:

df['Q']= map(lambda p, D: EOQ(D,p,ck,ch),df['p'], df['D']) 

(returns only 'map' types) (仅返回“地图”类型)

I will need this type of processing more in my project and I hope to find something that works.我将在我的项目中更多地需要这种类型的处理,我希望找到有用的东西。

以下应该有效:

def EOQ(D,p,ck,ch):
    Q = math.sqrt((2*D*ck)/(ch*p))
    return Q
ch=0.2
ck=5
df['Q'] = df.apply(lambda row: EOQ(row['D'], row['p'], ck, ch), axis=1)
df

There are few more ways to apply a function on every row of a DataFrame.有几种方法可以在 DataFrame 的每一行上应用 function。

(1) You could modify EOQ a bit by letting it accept a row (a Series object) as argument and access the relevant elements using the column names inside the function. Moreover, you can pass arguments to apply using its keyword, eg ch or ck : (1) 您可以稍微修改EOQ ,让它接受一行(Series 对象)作为参数,并使用 function 中的列名访问相关元素。此外,您可以传递 arguments 以apply其关键字应用,例如chck

def EOQ1(row, ck, ch):
    Q = math.sqrt((2*row['D']*ck)/(ch*row['p']))
    return Q

df['Q1'] = df.apply(EOQ1, ck=ck, ch=ch, axis=1)

(2) It turns out that apply is often slower than a list comprehension (in the benchmark below, it's 20x slower). (2) 事实证明apply通常比列表理解慢(在下面的基准测试中,它慢 20 倍)。 To use a list comprehension, you could modify EOQ still further so that you access elements by its index.要使用列表理解,您可以进一步修改EOQ ,以便您通过索引访问元素。 Then call the function in a loop over df rows that are converted to lists:然后在转换为列表的df行循环中调用 function:

def EOQ2(row, ck, ch):
    Q = math.sqrt((2*row[0]*ck)/(ch*row[1]))
    return Q

df['Q2a'] = [EOQ2(x, ck, ch) for x in df[['D','p']].to_numpy().tolist()]

(3) As it happens, if the goal is to call a function iteratively, map is usually faster than a list comprehension. (3) 碰巧的是,如果目标是迭代调用map通常比列表理解更快。 So you could convert df into a list, map the function to it;所以你可以将df转换成一个列表, map function 到它; then unpack the result in a list:然后将结果解压缩到列表中:

df['Q2b'] = [*map(EOQ2, df[['D','p']].to_numpy().tolist(), [ck]*len(df), [ch]*len(df))]

(4) As @EdChum notes , it's always better to use vectorized methods if it's possible to do so, instead of applying a function row by row. (4) 正如@EdChum 指出的那样,如果可能的话,最好使用矢量化方法,而不是逐行应用 function。 Pandas offers vectorized methods that rival that of numpy's. Pandas 提供了可与 numpy 相媲美的矢量化方法。 In the case of EOQ for example, instead of math.sqrt , you could use pandas' pow method (in the benchmark below, using pandas vectorized methods is ~20% faster than using numpy):例如,在EOQ的情况下,您可以使用 pandas 的pow方法而不是math.sqrt (在下面的基准测试中,使用 pandas 向量化方法比使用 numpy 快约 20%):

df['Q_pd'] = df['D'].mul(2*ck).div(ch*df['p']).pow(0.5)

Output: Output:

    D   p          Q       Q_np         Q1        Q2a        Q2b       Q_pd
0  10  20   5.000000   5.000000   5.000000   5.000000   5.000000   5.000000
1  20  30   5.773503   5.773503   5.773503   5.773503   5.773503   5.773503
2  30  10  12.247449  12.247449  12.247449  12.247449  12.247449  12.247449

Timings:计时:

df = pd.DataFrame({"D": [10,20,30], "p": [20, 30, 10]})
df = pd.concat([df]*10000)

>>> %timeit df['Q'] = df.apply(lambda row: EOQ(row['D'], row['p'], ck, ch), axis=1)
623 ms ± 22.7 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

>>> %timeit df['Q1'] = df.apply(EOQ1, ck=ck, ch=ch, axis=1)
615 ms ± 39.9 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

>>> %timeit df['Q2a'] = [EOQ2(x, ck, ch) for x in df[['D','p']].to_numpy().tolist()]
31.3 ms ± 479 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

>>> %timeit df['Q2b'] = [*map(EOQ2, df[['D','p']].to_numpy().tolist(), [ck]*len(df), [ch]*len(df))]
26.9 ms ± 306 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

>>> %timeit df['Q_np'] = np.sqrt((2*df['D']*ck)/(ch*df['p']))
1.19 ms ± 53.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

>>> %timeit df['Q_pd'] = df['D'].mul(2*ck).div(ch*df['p']).pow(0.5)
966 µs ± 27 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM