简体   繁体   English

Python/ Pandas If 语句在 function 中解释

[英]Python/ Pandas If statement inside a function explained

I have the following example and I cannot understand why it doesn't work.我有以下示例,但我不明白为什么它不起作用。

import pandas as pd

d = {'col1': [1, 2], 'col2': [3, 4]}
df = pd.DataFrame(data=d)

def balh(a, b):
    z = a + b
    if z.any() > 1:
        return z + 1
    else:
        return z

df['col3'] = balh(df.col1, df.col2)

Output: Output:

在此处输入图像描述

My expected output would be see 5 and 7 not 4 and 6 in col3 , since 4 and 6 are grater than 1 and my intention is to add 1 if a + b are grater than 1我预期的grater将在col3中看到57而不是46 ,因为46大于1并且我的意图是如果a + b grater 1add 1

The any method will evaluate if any element of the pandas.Series or pandas.DataFrame is True . any方法将评估pandas.Seriespandas.DataFrame的任何元素是否为True A non-null integer is evaluated as True .非空 integer 被评估为True So essentially by if z.any() > 1 you are comparing the True returned by the method with the 1 integer.因此,本质上, if z.any() > 1 ,您将该方法返回的True1 integer 进行比较。

You need to condition directly the pandas.Series which will return a boolean pandas.Series where you can safely apply the any method.您需要直接调节pandas.Series将返回boolean pandas.Series ,您可以安全地应用any方法。

This will be the same for the all method.对于all方法,这将是相同的。

def balh(a, b):
    z = a + b
    if (z > 1).any():
        return z + 1
    else:
        return z

As @arhr clearly explained the issue was the incorrect call to z.any() , which returns True when there is at least one non-zero element in z .正如@arhr 清楚地解释的那样,问题是对z.any()的错误调用,当z中至少有一个非零元素时,它返回True It resulted in a True > 1 which is a False expression.它导致True > 1False表达式。

A one line alternative to avoid the if statement and the custom function call would be the following:避免 if 语句和自定义 function 调用的单行替代方法如下:

df['col3'] = df.iloc[:, :2].sum(1).transform(lambda x: x + int(x > 1))

This gets the first two columns in the dataframe then sums the elements along each row and transforms the new column according to the lambda function.这将获取 dataframe 中的前两列,然后对每一行的元素求和,并根据 lambda function 转换新列。

The iloc can also be omitted because the dataframe is instantiated with only two columns col1 and col2 , thus the line can be refactored to: iloc也可以省略,因为 dataframe 仅用两列col1col2实例化,因此该行可以重构为:

df['col3'] = df.sum(1).transform(lambda x: x + int(x > 1))

Example output:示例 output:

   col1  col2  col3
0     1     3     5
1     2     4     7

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM