简体   繁体   中英

If statement across multiple columns in Pandas

This is the data I have:

| total |  big  |  med  | small| big_perc | med_perc | sml_perc |
|:-----:|:-----:|:-----:|:----:|:--------:|:--------:|:--------:| 
|   5   |   4   |   0   |   1  |   0.8    |   0.0    |   0.2    |
|   6   |   0   |   3   |   3  |   0.0    |   0.5    |   0.5    | 
|   5   |   2   |   3   |   0  |   0.4    |   0.6    |   0.0    |

This is what I would like to create:

| total |  big  |  med  | sml  | big_perc | med_perc | sml_perc | condition |   size  |
|:-----:|:-----:|:-----:|:----:|:--------:|:--------:|:--------:|:--------: |:--------:
|   5   |   4   |   0   |   1  |   0.8    |   0.0    |   0.2    |    YES    |   big   |
|   6   |   0   |   3   |   3  |   0.0    |   0.5    |   0.5    |    NO     |         | 
|   5   |   2   |   3   |   0  |   0.4    |   0.6    |   0.0    |    YES    |   med   | 

For the condition column id like it to say yes if big_perc, med_perc or sml_perc is greater than 0.6 and be blank if that condition is not met.

For the size column id like it to say whichever column is greater than 0.6 or else also be blank

Here is what i've tried:

for (df['condition'] in len(df):

    if df['big_perc'] >= 0.60:
        df['condition'] = 'YES'
    elif df['med_perc'] >= 0.60:
        df['condition'] = 'YES'
    elif df['sml_perc'] >= 0.60:
        df['condition'] = 'YES'
    else: 
        df['condition'] = ''

I tried the same for/if statement for the size column

For the condition column, np.where suffices, since it is just a single condition; however for the size column, since it has multiple conditions, np.select should fit in:

df["condition"] = np.where(df.filter(like="perc").ge(0.6).any(axis=1), "YES", "NO")

cond1 = df.filter(like="perc").gt(0.6).any(axis=1)
cond2 = df.filter(like="perc").ge(0.6).any(axis=1)
cond3 = df.filter(like="perc").lt(0.6).all(axis=1)
condlist = [cond1, cond2, cond3]
choicelist = ["big", "med", ""]

df["size"] = np.select(condlist, choicelist)


    total   big med small   big_perc    med_perc sml_perc condition size
0   5        4  0   1       0.8         0.0         0.2    YES      big
1   6        0  3   3       0.0         0.5         0.5    NO   
2   5        2  3   0       0.4         0.6         0.0    YES      med

You can try this for your above dataframe:

df['size'] = (df.iloc[:, 4:] >= .6).dot(df.columns[4:]).str.split('_').str[0]
df['condition'] = np.where(df['size']!='', 'YES', 'NO')

Output:

   total  big  med  small  big_perc  med_perc  sml_perc size condition
1    5.0  4.0  0.0    1.0       0.8       0.0       0.2  big       YES
2    6.0  0.0  3.0    3.0       0.0       0.5       0.5             NO
3    5.0  2.0  3.0    0.0       0.4       0.6       0.0  med       YES

Slice you dataframe to only select the columns with the percentages, then create a boolean matrix for greater or equal to.6, then use dot to capture the column name for those True values. Use string manipulation to get big, medium or small.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM