简体   繁体   中英

Using conditional if/else logic with pandas dataframe columns

My dataframe called pw2 looks something like this, where I have two columns, pw1 and pw2, which are probability of wins. I'd like to perform some conditional logic to create another column called WINNER based off pw1 and pw2 .

|          Name1          |     pw1     |   Name2   |     pw2     |
| Seaking                 | 0.517184213 | Lickitung | 0.189236181 |
| Ferrothorn              | 0.172510623 | Quagsire  | 0.260884258 |
| Thundurus Therian Forme | 0.772536272 | Hitmonlee | 0.694069408 |
| Flaaffy                 | 0.28681284  | NaN       | NaN         |

I want to do this conditionally in a function but I'm having some trouble.

  • if pw1 > pw2 , populate with Name1
  • if pw2 > pw1 , populate with Name2
  • if pw1 is populated but pw2 isn't, populate with Name1
  • if pw2 is populated but pw1 isn't, populate with Name2

But my function isn't working - for some reason checking if a value is null isn't working.

def final_winner(df):
    # If PW1 is missing and PW2 is populated, Pokemon 1 wins
    if df['pw1'] = None and df['pw2'] != None:
        return df['Number1']
    # If it's the same thing but the other way around, Pokemon 2 wins
    elif df['pw2'] = None and df['pw1'] != None:
        return df['Number2']
    # If pw2 is greater than pw1, then Pokemon 2 wins
    elif df['pw2'] > df['pw1']:
        return df['Number2']
        return df['Number1']

pw2['Winner'] = pw2.apply(final_winner, axis=1)

Do not use apply , which is very slow. Use np.where

pw2 = df.pw2.fillna(-np.inf)
df['winner'] = np.where(df.pw1 > pw2, df.Name1, df.Name2)

Once NaN s always lose, can just fillna() it with -np.inf to yield same logic.

Looking at your code, we can point out several problems. First, you are comparing df['pw1'] = None , which is invalid python syntax for comparison. You usually want to compare things using == operator. However, for None , it is recommended to use is , such as if variable is None: (...) . However again, you are in a pandas/numpy environment, where there actually several values for null values ( None , NaN , NaT , etc).

So, it is preferable to check for nullability using pd.isnull() or df.isnull() .

Just to illustrate, this is how your code should look like:

def final_winner(df):
    if pd.isnull(df['pw1']) and not pd.isnull(df['pw2']):
        return df['Name1']
    elif pd.isnull(df['pw2']) and not pd.isnull(df['pw1']):
        return df['Name1']
    elif df['pw2'] > df['pw1']:
        return df['Name2']
        return df['Name1']

df['winner'] = df.apply(final_winner, axis=1)

But again, definitely use np.where .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM