[英]Python/ Pandas If statement inside a function explained
I have the following example and I cannot understand why it doesn't work.我有以下示例,但我不明白为什么它不起作用。
import pandas as pd
d = {'col1': [1, 2], 'col2': [3, 4]}
df = pd.DataFrame(data=d)
def balh(a, b):
z = a + b
if z.any() > 1:
return z + 1
else:
return z
df['col3'] = balh(df.col1, df.col2)
Output: Output:
My expected output would be see 5
and 7
not 4
and 6
in col3
, since 4
and 6
are grater
than 1
and my intention is to add
1
if a + b
are grater
than 1
我预期的grater
将在col3
中看到5
和7
而不是4
和6
,因为4
和6
大于1
并且我的意图是如果a + b
grater
1
则add
1
The any
method will evaluate if any element of the pandas.Series
or pandas.DataFrame
is True
. any
方法将评估pandas.Series
或pandas.DataFrame
的任何元素是否为True
。 A non-null integer is evaluated as True
.非空 integer 被评估为True
。 So essentially by if z.any() > 1
you are comparing the True
returned by the method with the 1
integer.因此,本质上, if z.any() > 1
,您将该方法返回的True
与1
integer 进行比较。
You need to condition directly the pandas.Series
which will return a boolean pandas.Series
where you can safely apply the any
method.您需要直接调节pandas.Series
将返回boolean pandas.Series
,您可以安全地应用any
方法。
This will be the same for the all
method.对于all
方法,这将是相同的。
def balh(a, b):
z = a + b
if (z > 1).any():
return z + 1
else:
return z
As @arhr clearly explained the issue was the incorrect call to z.any()
, which returns True
when there is at least one non-zero element in z
.正如@arhr 清楚地解释的那样,问题是对z.any()
的错误调用,当z
中至少有一个非零元素时,它返回True
。 It resulted in a True > 1
which is a False
expression.它导致True > 1
是False
表达式。
A one line alternative to avoid the if statement and the custom function call would be the following:避免 if 语句和自定义 function 调用的单行替代方法如下:
df['col3'] = df.iloc[:, :2].sum(1).transform(lambda x: x + int(x > 1))
This gets the first two columns in the dataframe then sums the elements along each row and transforms the new column according to the lambda function.这将获取 dataframe 中的前两列,然后对每一行的元素求和,并根据 lambda function 转换新列。
The iloc
can also be omitted because the dataframe is instantiated with only two columns col1
and col2
, thus the line can be refactored to: iloc
也可以省略,因为 dataframe 仅用两列col1
和col2
实例化,因此该行可以重构为:
df['col3'] = df.sum(1).transform(lambda x: x + int(x > 1))
Example output:示例 output:
col1 col2 col3
0 1 3 5
1 2 4 7
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.