If else Function 有几个条件-Python

Question

I am trying to calculate the final revenue of my dataset.我正在尝试计算我的数据集的最终收入。 My dataset has several revenue streams, but given some conditions (that I will explain later) the revenue per client will be calculated differently for the final revenue.我的数据集有几个收入流，但在某些条件下（我将在后面解释），每个客户的收入将针对最终收入进行不同的计算。

I am not very comfortable creating functions yet so I'm not sure where I am making mistakes.我还不太习惯创建函数，所以我不确定我在哪里犯了错误。

Dataframe examples: Dataframe 示例：

ClientId   Sector    Class    Rev1    Rev2    Rev3
1          Sect_1    B        5       1       0 
2          Sect_2    A        5.5     2       0
3          Sect_3    B        6       1.5     1
4          Sect_4    A        5       1       1.5
5          Sect_5    B        5       2       1

I want to create a 7th column 'Final_Rev' given the following conditions :给定以下条件，我想创建第 7 列“Final_Rev”：

- If 'Sector' = (Sect_3 or Sect_4) : 'Final_Rev' = Rev2 + Rev3
- OR if 'Class' = ("A") : 'Final_Rev' = Rev2 + Rev3
- Otherwise 'Final_Rev' = Rev1

Expected Output should be the following:预期的 Output应如下所示：

ClientId   Sector    Class    Rev1    Rev2    Rev3    Final_Rev
1          Sect_1    B        5       1       0       5
2          Sect_2    A        5.5     2       0       2
3          Sect_3    B        6       1.5     1       2.5
4          Sect_4    A        5       1       1.5     2.5
5          Sect_5    B        5       2       1       5

I have tried to create the following function but I'm not sure what I'm doing wrong:我试图创建以下 function 但我不确定我做错了什么：

def Final_Rev():
    
    if Sector in ['Sect_3','Sect_4'] or Class == 'A':
        return df['Rev2'] + df['Rev3']
    else: 
        return df['Rev1']

df['Final_Rev'] = df.apply(Final_Rev, axis=1)

I have found an R solution that does what I want but I don't know how to convert it to python:我找到了一个 R 解决方案，可以满足我的要求，但我不知道如何将其转换为 python：

Final_Rev := ifelse(test = (Sector %in% c("Sect_3","Sect_4")|Class == "A"),
             yes = Rev2 + Rev3,
             no = Rev1

If someone could help me solve this, it would be really appreciate.如果有人可以帮助我解决这个问题，将不胜感激。 Thanks.谢谢。

Answer 1

You can use np.where :您可以使用np.where ：

df['Final_Rev'] = np.where(df['Sector'].isin(['Sect_3','Sect_4']) | (df['Class'] == 'A'), 
                           df['Rev2'] + df['Rev3'], 
                           df['Rev1'])

Output: Output：

   ClientId  Sector Class  Rev1  Rev2  Rev3  Final_Rev
0         1  Sect_1     B   5.0   1.0   0.0        5.0
1         2  Sect_2     A   5.5   2.0   0.0        2.0
2         3  Sect_3     B   6.0   1.5   1.0        2.5
3         4  Sect_4     A   5.0   1.0   1.5        2.5
4         5  Sect_5     B   5.0   2.0   1.0        5.0

Answer 2

apply takes a function as its first argument which takes the column or row as a pandas.Series, so your function needs to take this as an argument. apply 将function作为其第一个参数，它将列或行作为 pandas.Series，因此您的 function 需要将此作为参数。

import pandas as pd

def foo(ds):
    if ds['A'] == 1:
        return 26
    elif ds['B'] == 4:
        return 27
    else:
        return 2*ds['A'] + 3*ds['B']

df = pd.DataFrame(columns=['A', 'B'], data = [[1,2],[3,4],[5,6]])
df['C'] = df.apply(foo, axis=1)

    A   B   C
0   1   2   26
1   3   4   27
2   5   6   28

Answer 3

You can get your desired columns with a single expression:您可以使用单个表达式获得所需的列：

df['Final_Rev'] = df['Rev1'].where(
    ~(df['Sector'].isin({'Sect_3', 'Sect_4'}) | (df['Class'] == 'A')),
    df['Rev2'] + df['Rev3'])

This is the fastest approach because it doesn't require apply at all.这是最快的方法，因为它根本不需要apply 。

Generally speaking and for readability, I would recommend to make masks corresponding to each sub-clause of your condition.一般来说，为了便于阅读，我建议制作与您的条件的每个子条款相对应的掩码。 In your case, there are two possible results (either rev1 or rev2 + rev3 ).在您的情况下，有两种可能的结果（ rev1或rev2 + rev3 ）。 The first is a default value, the second depends on a single condition: Sector in {'Sect_3', 'Sect_4'} or Class == 'A' .第一个是默认值，第二个取决于一个条件： Sector in {'Sect_3', 'Sect_4'} or Class == 'A' 。 Therefore:所以：

mask = df['Sector'].isin({'Sect_3', 'Sect_4'}) | (df['Class'] == 'A')
df['Final_Rev'] = df['Rev1']  # default value
df.loc[mask, 'Final_Rev'] = df.loc[mask, 'Rev2'] + df.loc[mask, 'Rev3']

If you insist on calling a Python function on every row, you can also do that, but it will be way slower:如果您坚持在每一行上调用 Python function，您也可以这样做，但会慢很多：

def myfunc(r):
    if r.Sector == 'Sect_3' or r.Sector == 'Sect_4' or r.Class == 'A':
        return r.Rev2 + r.Rev3
    return r.Rev1

df.apply(myfunc, axis=1)

# out:
0    5.0
1    2.0
2    2.5
3    2.5
4    5.0

Performance :性能：

Why do I say the .where() form is the fastest?为什么我说.where()形式是最快的？ Because it is all vectorized, and there is no need for repeated calls into a Python function.因为它都是矢量化的，不需要重复调用 Python function。

Here is a test:这是一个测试：

n = int(1e5)
df = pd.DataFrame({
    'ClientId': np.arange(n),
    'Sector': np.random.choice([f'Sect_{k}' for k in range(1, 8)], size=n),
    'Class': np.random.choice(list('ABCDEF'), size=n),
    'Rev1': np.random.randint(0, 20, size=n) * 0.5,
    'Rev2': np.random.randint(0, 20, size=n) * 0.5,
    'Rev3': np.random.randint(0, 20, size=n) * 0.5,
})
%timeit df['Rev1'].where(~(df['Sector'].isin({'Sect_3', 'Sect_4'}) | (df['Class'] == 'A')), df['Rev2'] + df['Rev3'])
10.9 ms ± 417 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit df.apply(myfunc, axis=1)
2.34 s ± 8.39 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

The first form is over 200x faster !第一种形式快 200 倍以上！

If else Function 有几个条件-Python

问题描述

3 个解决方案

解决方案1
2 已采纳 2020-12-17 18:13:09

解决方案2
2 2020-12-17 18:15:54

解决方案3
0 2020-12-17 18:13:08

If else Function 有几个条件-Python

问题描述

3 个解决方案

解决方案1 2 已采纳 2020-12-17 18:13:09

解决方案2 2 2020-12-17 18:15:54

解决方案3 0 2020-12-17 18:13:08

解决方案1
2 已采纳 2020-12-17 18:13:09

解决方案2
2 2020-12-17 18:15:54

解决方案3
0 2020-12-17 18:13:08