简体   繁体   中英

Assigning values to new column based on multiple string conditions

What I have:

|    ID   |   Possible_Size    |   Actual_Size     |
|:------: |:------------------:|:-----------------:|  
|   1234  |         BIG        |        BIG        |
|   5678  |       MEDIUM       |        BIG        |
|   9876  |        SMALL       |       SMALL       |       
|   1092  |       MEDIUM       |       MEDIUM      |

What I would like to create:

|    ID   |   Possible_Size    |   Actual_Size     |       Big       |
|:------: |:------------------:|:-----------------:|:---------------:|  
|   1234  |         BIG        |        BIG        |  True Positive  |
|   5678  |       MEDIUM       |        BIG        |  False Negative |  
|   9876  |        BIG         |       SMALL       |  False Positive |   
|   1092  |       MEDIUM       |       MEDIUM      |                 |

What I have tried:

    def sizes(row):
                        
        if row['Actual_Size'] in ['BIG'] and row['Possible_Size'] in ['BIG']:
            df['Big'] = 'True Positive'
        elif row['Actual_Size'] in ['BIG'] and row['Possible_Size'] in ['MEDIUM', 'SMALL']:
            df['Big'] = 'False Negative'
        elif row['Actual_Size'] in ['MEDIUM', 'SMALL'] and row['Possible_Size'] in ['BIG']:
            df['Big'] = 'False Positive'  
        else:
            df['Big'] = ''
                        
    df.apply(sizes, axis=1)

Currently I am getting a blank 'Big' column

For this multiple if/elif statements you could use np.select :

choices = ['True Positive','False Negative','False Positive']
conditions = [
       ((df['Actual_Size'].isin(['BIG']))&(df['Possible_Size'].isin(['BIG']))), 
       ((df['Actual_Size'].isin(['BIG']))&(df['Possible_Size'].isin(['MEDIUM', 'SMALL']))),
       ((df['Actual_Size'].isin(['MEDIUM', 'SMALL']))&(df['Possible_Size'].isin(['BIG'])))]
import numpy as np
df['Big'] = np.select(conditions, choices, default='')

If you want to keep your original solution, the problem was that you were returning nothing when applying the function row by row, so you could try this:

def sizes(row):

    if row['Actual_Size'] in ['BIG'] and row['Possible_Size'] in ['BIG']:
        return'True Positive'
    elif row['Actual_Size'] in ['BIG'] and row['Possible_Size'] in ['MEDIUM', 'SMALL']:
        return 'False Negative'
    elif row['Actual_Size'] in ['MEDIUM', 'SMALL'] and row['Possible_Size'] in ['BIG']:
        return 'False Positive'  
    else:
        return ''

df['Big']=df.apply(sizes, axis=1)

Both outputs:

df
     ID Possible_Size Actual_Size             Big
0  1234           BIG         BIG   True Positive
1  5678        MEDIUM         BIG  False Negative
2  9876           BIG       SMALL  False Positive
3  1092        MEDIUM      MEDIUM                

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM