This is the data I have:
| total | big | med | small| big_perc | med_perc | sml_perc |
|:-----:|:-----:|:-----:|:----:|:--------:|:--------:|:--------:|
| 5 | 4 | 0 | 1 | 0.8 | 0.0 | 0.2 |
| 6 | 0 | 3 | 3 | 0.0 | 0.5 | 0.5 |
| 5 | 2 | 3 | 0 | 0.4 | 0.6 | 0.0 |
This is what I would like to create:
| total | big | med | sml | big_perc | med_perc | sml_perc | condition | size |
|:-----:|:-----:|:-----:|:----:|:--------:|:--------:|:--------:|:--------: |:--------:
| 5 | 4 | 0 | 1 | 0.8 | 0.0 | 0.2 | YES | big |
| 6 | 0 | 3 | 3 | 0.0 | 0.5 | 0.5 | NO | |
| 5 | 2 | 3 | 0 | 0.4 | 0.6 | 0.0 | YES | med |
For the condition column id like it to say yes if big_perc, med_perc or sml_perc is greater than 0.6 and be blank if that condition is not met.
For the size column id like it to say whichever column is greater than 0.6 or else also be blank
Here is what i've tried:
for (df['condition'] in len(df):
if df['big_perc'] >= 0.60:
df['condition'] = 'YES'
elif df['med_perc'] >= 0.60:
df['condition'] = 'YES'
elif df['sml_perc'] >= 0.60:
df['condition'] = 'YES'
else:
df['condition'] = ''
I tried the same for/if statement for the size column
For the condition
column, np.where suffices, since it is just a single condition; however for the size
column, since it has multiple conditions, np.select should fit in:
df["condition"] = np.where(df.filter(like="perc").ge(0.6).any(axis=1), "YES", "NO")
cond1 = df.filter(like="perc").gt(0.6).any(axis=1)
cond2 = df.filter(like="perc").ge(0.6).any(axis=1)
cond3 = df.filter(like="perc").lt(0.6).all(axis=1)
condlist = [cond1, cond2, cond3]
choicelist = ["big", "med", ""]
df["size"] = np.select(condlist, choicelist)
total big med small big_perc med_perc sml_perc condition size
0 5 4 0 1 0.8 0.0 0.2 YES big
1 6 0 3 3 0.0 0.5 0.5 NO
2 5 2 3 0 0.4 0.6 0.0 YES med
You can try this for your above dataframe:
df['size'] = (df.iloc[:, 4:] >= .6).dot(df.columns[4:]).str.split('_').str[0]
df['condition'] = np.where(df['size']!='', 'YES', 'NO')
Output:
total big med small big_perc med_perc sml_perc size condition
1 5.0 4.0 0.0 1.0 0.8 0.0 0.2 big YES
2 6.0 0.0 3.0 3.0 0.0 0.5 0.5 NO
3 5.0 2.0 3.0 0.0 0.4 0.6 0.0 med YES
Slice you dataframe to only select the columns with the percentages, then create a boolean matrix for greater or equal to.6, then use dot
to capture the column name for those True values. Use string manipulation to get big, medium or small.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.