When I need to drop the rows which have "sum of all probability value (at top 10 digits float) greater than 1" in my dataframe, pandas gave me wrong results.
My code:
# drop wrong probability row
data.at[data[data.p1 + data.p2 + data.p3 > 1.001].index, 'h1'] = 'dropped by pandas'
The results:
re______________ | p1________ | p2________ | p3________ | sump
correct result_____ | 0.743088844 | 0.24208727 | 0.014823886 | 1 << correct
correct result_____ | 0.647239626 | 0.346835025 | 0.00592535 | 1 << correct
correct result_____ | 0.65043824 | 0.34372226 | 0.0058395 | 1 << correct
correct result_____ | 0.75111312 | 0.221604341 | 0.027282539 | 1 << correct
dropped by pandas | 0.670277591 | 0.324265434 | 0.005456975 | 1 << wrong
dropped by pandas | 0.672221755 | 0.322438072 | 0.005340173 | 1 << wrong
dropped by pandas | 0.670053332 | 0.324742569 | 0.005204099 | 1 << wrong
dropped by pandas | 0.667690433 | 0.327033634 | 0.005275932 | 1 << wrong
dropped by pandas | 0.237037933 | 0.823248091 | 0.05335034 | 1.113636364 << correct
dropped by pandas | 0.242720919 | 0.818282268 | 0.052633177 | 1.113636364 << correct
More clear image results:
It seems like sometimes it will work but sometimes doesn't, which drives me crazy...
(I tried to set the precision to 16 but I found that only affects the display number.)
you're adding all the results and then comparing them with one, without dividing them by 3 or comparing them with 3. simply change data.at[data[data.p1 + data.p2 + data.p3 > 1.001].index, 'h1'] = 'dropped by pandas'
to data.at[data[data.p1 + data.p2 + data.p3 > 3].index, 'h1'] = 'dropped by pandas'
or data.at[data[data.p1 + data.p2 + data.p3/3 > 1].index, 'h1'] = 'dropped by pandas'
. also, you don't need to compare them with 1.001, you can compare them with 1, because the > function is more than not >=, which is more than or equal.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.