简体   繁体   English

If else Function 有几个条件-Python

[英]If else Function with several conditions -Python

I am trying to calculate the final revenue of my dataset.我正在尝试计算我的数据集的最终收入。 My dataset has several revenue streams, but given some conditions (that I will explain later) the revenue per client will be calculated differently for the final revenue.我的数据集有几个收入流,但在某些条件下(我将在后面解释),每个客户的收入将针对最终收入进行不同的计算。

I am not very comfortable creating functions yet so I'm not sure where I am making mistakes.我还不太习惯创建函数,所以我不确定我在哪里犯了错误。

Dataframe examples: Dataframe 示例:

ClientId   Sector    Class    Rev1    Rev2    Rev3
1          Sect_1    B        5       1       0 
2          Sect_2    A        5.5     2       0
3          Sect_3    B        6       1.5     1
4          Sect_4    A        5       1       1.5
5          Sect_5    B        5       2       1

I want to create a 7th column 'Final_Rev' given the following conditions :给定以下条件,我想创建第 7 列“Final_Rev”:

- If 'Sector' = (Sect_3 or Sect_4) : 'Final_Rev' = Rev2 + Rev3
- OR if 'Class' = ("A") : 'Final_Rev' = Rev2 + Rev3
- Otherwise 'Final_Rev' = Rev1

Expected Output should be the following:预期的 Output应如下所示:

ClientId   Sector    Class    Rev1    Rev2    Rev3    Final_Rev
1          Sect_1    B        5       1       0       5
2          Sect_2    A        5.5     2       0       2
3          Sect_3    B        6       1.5     1       2.5
4          Sect_4    A        5       1       1.5     2.5
5          Sect_5    B        5       2       1       5

I have tried to create the following function but I'm not sure what I'm doing wrong:我试图创建以下 function 但我不确定我做错了什么:

def Final_Rev():
    
    if Sector in ['Sect_3','Sect_4'] or Class == 'A':
        return df['Rev2'] + df['Rev3']
    else: 
        return df['Rev1']

df['Final_Rev'] = df.apply(Final_Rev, axis=1)

I have found an R solution that does what I want but I don't know how to convert it to python:我找到了一个 R 解决方案,可以满足我的要求,但我不知道如何将其转换为 python:

Final_Rev := ifelse(test = (Sector %in% c("Sect_3","Sect_4")|Class == "A"),
             yes = Rev2 + Rev3,
             no = Rev1

If someone could help me solve this, it would be really appreciate.如果有人可以帮助我解决这个问题,将不胜感激。 Thanks.谢谢。

You can use np.where :您可以使用np.where

df['Final_Rev'] = np.where(df['Sector'].isin(['Sect_3','Sect_4']) | (df['Class'] == 'A'), 
                           df['Rev2'] + df['Rev3'], 
                           df['Rev1'])

Output: Output:

   ClientId  Sector Class  Rev1  Rev2  Rev3  Final_Rev
0         1  Sect_1     B   5.0   1.0   0.0        5.0
1         2  Sect_2     A   5.5   2.0   0.0        2.0
2         3  Sect_3     B   6.0   1.5   1.0        2.5
3         4  Sect_4     A   5.0   1.0   1.5        2.5
4         5  Sect_5     B   5.0   2.0   1.0        5.0

apply takes a function as its first argument which takes the column or row as a pandas.Series, so your function needs to take this as an argument. apply 将function作为其第一个参数,它将列或行作为 pandas.Series,因此您的 function 需要将此作为参数。

import pandas as pd

def foo(ds):
    if ds['A'] == 1:
        return 26
    elif ds['B'] == 4:
        return 27
    else:
        return 2*ds['A'] + 3*ds['B']

df = pd.DataFrame(columns=['A', 'B'], data = [[1,2],[3,4],[5,6]])
df['C'] = df.apply(foo, axis=1)

    A   B   C
0   1   2   26
1   3   4   27
2   5   6   28

You can get your desired columns with a single expression:您可以使用单个表达式获得所需的列:

df['Final_Rev'] = df['Rev1'].where(
    ~(df['Sector'].isin({'Sect_3', 'Sect_4'}) | (df['Class'] == 'A')),
    df['Rev2'] + df['Rev3'])

This is the fastest approach because it doesn't require apply at all.这是最快的方法,因为它根本不需要apply

Generally speaking and for readability, I would recommend to make masks corresponding to each sub-clause of your condition.一般来说,为了便于阅读,我建议制作与您的条件的每个子条款相对应的掩码。 In your case, there are two possible results (either rev1 or rev2 + rev3 ).在您的情况下,有两种可能的结果( rev1rev2 + rev3 )。 The first is a default value, the second depends on a single condition: Sector in {'Sect_3', 'Sect_4'} or Class == 'A' .第一个是默认值,第二个取决于一个条件: Sector in {'Sect_3', 'Sect_4'} or Class == 'A' Therefore:所以:

mask = df['Sector'].isin({'Sect_3', 'Sect_4'}) | (df['Class'] == 'A')
df['Final_Rev'] = df['Rev1']  # default value
df.loc[mask, 'Final_Rev'] = df.loc[mask, 'Rev2'] + df.loc[mask, 'Rev3']

If you insist on calling a Python function on every row, you can also do that, but it will be way slower:如果您坚持在每一行上调用 Python function,您也可以这样做,但会慢很多:

def myfunc(r):
    if r.Sector == 'Sect_3' or r.Sector == 'Sect_4' or r.Class == 'A':
        return r.Rev2 + r.Rev3
    return r.Rev1

df.apply(myfunc, axis=1)

# out:
0    5.0
1    2.0
2    2.5
3    2.5
4    5.0

Performance :性能

Why do I say the .where() form is the fastest?为什么我说.where()形式是最快的? Because it is all vectorized, and there is no need for repeated calls into a Python function.因为它都是矢量化的,不需要重复调用 Python function。

Here is a test:这是一个测试:

n = int(1e5)
df = pd.DataFrame({
    'ClientId': np.arange(n),
    'Sector': np.random.choice([f'Sect_{k}' for k in range(1, 8)], size=n),
    'Class': np.random.choice(list('ABCDEF'), size=n),
    'Rev1': np.random.randint(0, 20, size=n) * 0.5,
    'Rev2': np.random.randint(0, 20, size=n) * 0.5,
    'Rev3': np.random.randint(0, 20, size=n) * 0.5,
})
%timeit df['Rev1'].where(~(df['Sector'].isin({'Sect_3', 'Sect_4'}) | (df['Class'] == 'A')), df['Rev2'] + df['Rev3'])
10.9 ms ± 417 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit df.apply(myfunc, axis=1)
2.34 s ± 8.39 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

The first form is over 200x faster !第一种形式快 200 倍以上

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM