简体   繁体   中英

conditional formatting in dataframe using python

I have a pandas data frame and I need to classify it based on the specified condition. The threshold is fixed and it has to be classified based on 8 different combinations of the threshold.

Threshold (A => 7, B = 3 or 4, C = between 22 - 27) 

I tried using pandas with conditional operations to classify the data but it produces misleading results.

Conditions are:

1. class1=f[(f['A']>7.0)&((f['B']==3.0)|(f['B']==4.0))& ((f['C']>=22.0)&(f['C']<=27.0))]
2. class2=f[(f['A']>7.0)&((f['B']==3.0)|(f['B']==4.0))& ((f['C']<=22.0)&(f['C']>=27.0))]
3. class3=f[(f['A']<7.0)&((f['B']==3.0)|(f['B']==4.0))& ((f['C']>=22.0)&(f['C']<=27.0))]
4. class4=f[(f['A']>7.0)&((f['B']!=3.0)&(f['B']!=4.0))& ((f['C']>=22.0)&(f['C']<=27.0))]
5. class5=f[(f['A']>7.0)&((f['B']!=3.0)&(f['B']!=4.0))& ((f['C']<=22.0)&(f['C']>=27.0))]
6. class6=f[(f['A']<7.0)&((f['B']==3.0)|(f['B']==4.0))& ((f['C']<=22.0)&(f['C']>=27.0))]
7. class7=f[(f['A']<7.0)&((f['B']!=3.0)&(f['B']!=4.0))& ((f['C']>=22.0)&(f['C']<=27.0))]
8. class8=f[(f['A']<7.0)|((f['B']!=3.0)&(f['B']!=4.0))| ((f['C']<=22.0)&(f['C']>=27.0))]

I need all rows in the data frame to be classified based on the conditions.

Your situation : your dataframe is called f and contains 3 columns with numeric values. The columns are called 'A' , 'B' and 'C' .

I recommend doing it by creating boolean columns and combining them to match your classes. There are probably many more ways to do this, also much more elegant ones. I think this solution is as simple as they get. Essentially, you have three conditions that can be met:

check_a = f['A'] >= 7
check_b = (f['B'] == 3) | (f['B'] == 4)
check_c = (22 <= f['C'] <= 27)

Combining these 3 checks will construct your 8 cases ( ~ negates the booleans, so basically flipping their values):

f['class_1'] =  check_a &  check_b &  check_c
f['class_2'] =  check_a &  check_b & ~check_c
f['class_3'] = ~check_a &  check_b &  check_c
f['class_4'] =  check_a & ~check_b &  check_c
f['class_5'] =  check_a & ~check_b & ~check_c
f['class_6'] = ~check_a &  check_b & ~check_c
f['class_7'] = ~check_a & ~check_b &  check_c
f['class_8'] = ~check_a & ~check_b & ~check_c

One of the reason your code doesn't work, is that you are checking whether the values in column 'C' are both smaller than 22 AND larger than 27. This can never be true.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM