简体   繁体   English

Pandas 与 Lambda 函数一起应用

[英]Pandas Apply with Lambda Function

I've tried to simplify my problem to the bear bones in the example below.在下面的示例中,我试图将我的问题简化为熊骨头。 I am attempting to apply a function to a pandas data frame (much more complex than the one below) but the function contains an if statement that throws a Value Error:我正在尝试apply函数应用于 pandas 数据框(比下面的要复杂得多),但该函数包含一个引发值错误的 if 语句:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

How can I handle passing a series to this lambda function without incurring this error?如何处理将系列传递给此 lambda 函数而不发生此错误?

def shot_test(make, att):
  if att > 75:
    return make / att
  else:
    return 0

f = lambda x: np.where(x.total > 30, shot_test(x.make, x.att), 0)
df['P'] = df.apply(f, axis=1)

I used some made up data, but I believe this should get you what you are looking for IIUC.我使用了一些虚构的数据,但我相信这应该会让你得到你正在寻找的 IIUC。

def shot_test(make, att):
    if att > 75:
        return make / att
    else:
        return 0

trips = {'Column1':[0, 2, 19, 15, 0,  23, 0, 0, 10,0],
        'Column2':[1, 2, 15, 1, 4,  22, 1, 0, 143,5],
        'Column3':[2, 1, 54, 543, 34,  243, 7, 0, 213,5]}
df = pd.DataFrame(trips)
df['Lambda_Test'] = df.apply(lambda x : shot_test(x['Column2'], x['Column3']) if x['Column1'] >= 10 else 0, axis = 1)
df

This will allow you to pass multiple column arguments into the shot_test function as well as testing if a separate column meets a certain threshold.这将允许您将多个列参数传递到 shot_test 函数以及测试单独的列是否满足某个阈值。

Here are two example uses of your code, one which works and one which generates your error:以下是您的代码的两个示例用法,一种有效,另一种产生错误:

import pandas as pd
import numpy as np

def shot_test(make, att):
  if att > 75:
    return make / att
  else:
    return 0

f = lambda x: np.where(x.total > 30, shot_test(x.make, x.att), 0)

print("\nTest #1:")
df = pd.DataFrame({'total':[25,50,60], 'make':[300,500,1000], 'att':[100,100,50]})
print(df)
df['P'] = df.apply(f, axis=1)
print(df)


print("\nTest #2:")
df = pd.DataFrame({'total':[25,50,60], 'make':[300,500,1000], 'att':[pd.Series([100,50]),pd.Series([100,50]),pd.Series([50,100])]})
print(df)
df['P'] = df.apply(f, axis=1)
print(df)

Output:输出:


Test #1:
   total  make  att
0     25   300  100
1     50   500  100
2     60  1000   50
   total  make  att    P
0     25   300  100  0.0
1     50   500  100  5.0
2     60  1000   50    0

Test #2:
   total  make                             att
0     25   300  0    100
1     50
dtype: int64
1     50   500  0    100
1     50
dtype: int64
2     60  1000  0     50
1    100
dtype: int64
Traceback (most recent call last):
  File "XXX.py", line 23, in <module>
    df['P'] = df.apply(f, axis=1)
  File "YYY\Python\Python310\lib\site-packages\pandas\core\frame.py", line 8833, in apply
    return op.apply().__finalize__(self, method="apply")
  File "YYY\Python\Python310\lib\site-packages\pandas\core\apply.py", line 727, in apply
    return self.apply_standard()
  File "YYY\Python\Python310\lib\site-packages\pandas\core\apply.py", line 851, in apply_standard
    results, res_index = self.apply_series_generator()
  File "YYY\Python\Python310\lib\site-packages\pandas\core\apply.py", line 867, in apply_series_generator
    results[i] = self.f(v)
  File "XXX.py", line 11, in <lambda>
    f = lambda x: np.where(x.total > 30, shot_test(x.make, x.att), 0)
  File "XXX.py", line 6, in shot_test
    if att > 75:
  File "YYY\Python\Python310\lib\site-packages\pandas\core\generic.py", line 1535, in __nonzero__
    raise ValueError(
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

As you can see, in the second example, each value in column att of the dataframe is itself a pandas Series, and this is triggering the error on this line:如您所见,在第二个示例中,数据框的att列中的每个值本身就是一个熊猫系列,这会触发这一行的错误:

  if att > 75:

Assuming the data in the att column can be easily transformed from Series to scalar, you can do this and modify the line of code above to be unambiguous.假设att列中的数据可以轻松地从 Series 转换为标量,您可以这样做并将上面的代码行修改为明确的。 However, if att is indeed supposed to be a Series (or other array-like structure) with multiple values, you may need to rethink the logic of your code.但是,如果att确实应该是具有多个值的 Series(或其他类似数组的结构),您可能需要重新考虑代码的逻辑。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM