I have the following example and I cannot understand why it doesn't work.
import pandas as pd
d = {'col1': [1, 2], 'col2': [3, 4]}
df = pd.DataFrame(data=d)
def balh(a, b):
z = a + b
if z.any() > 1:
return z + 1
else:
return z
df['col3'] = balh(df.col1, df.col2)
Output:
My expected output would be see 5
and 7
not 4
and 6
in col3
, since 4
and 6
are grater
than 1
and my intention is to add
1
if a + b
are grater
than 1
The any
method will evaluate if any element of the pandas.Series
or pandas.DataFrame
is True
. A non-null integer is evaluated as True
. So essentially by if z.any() > 1
you are comparing the True
returned by the method with the 1
integer.
You need to condition directly the pandas.Series
which will return a boolean pandas.Series
where you can safely apply the any
method.
This will be the same for the all
method.
def balh(a, b):
z = a + b
if (z > 1).any():
return z + 1
else:
return z
As @arhr clearly explained the issue was the incorrect call to z.any()
, which returns True
when there is at least one non-zero element in z
. It resulted in a True > 1
which is a False
expression.
A one line alternative to avoid the if statement and the custom function call would be the following:
df['col3'] = df.iloc[:, :2].sum(1).transform(lambda x: x + int(x > 1))
This gets the first two columns in the dataframe then sums the elements along each row and transforms the new column according to the lambda function.
The iloc
can also be omitted because the dataframe is instantiated with only two columns col1
and col2
, thus the line can be refactored to:
df['col3'] = df.sum(1).transform(lambda x: x + int(x > 1))
Example output:
col1 col2 col3
0 1 3 5
1 2 4 7
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.