简体   繁体   中英

is there a way, to count all rows which contain at least one '1' in a dataframe checking multiple named columns?

I have a dataset filled with Medicare beneficiaries. The question is: 'What proportion of patients have at least one of the chronic conditions described in the independent variables alzheimers, arthritis, cancer, copd, depression, diabetes, heart.failure, ihd, kidney, osteoporosis, and stroke?'

I tried creating a subset and using isnull() & any(), but i can´t get a proper solution.. also tried df.loc but it only lets me name one column..

i am attaching the dataset for better understanding!

https://drive.google.com/file/d/1R--YEsBCDHMXjqNzAumT2zzUAYvM1bWA/view?usp=sharing

Thanks!!

My Try´s:

claimss.loc[:, ["alzheimers","diabetes","arthritis"] == 1]

(Wanted to try it first with 3 columns, doesn´t work in the first place..)

Try with Subset:

filtered_df = df.loc[raw_df] == 1]

(Created a Subset where only index and independet variables(diseases) appear and tried to look for null-rows)

If need filter only some columns names use subset for filter columns names, compare by 1 by DataFrame.eq and last test at least one True by DataFrame.any :

claimss[claimss[["alzheimers","diabetes","arthritis"]].eq(1).any(axis=1)]

If need percentage use mean with boolean mask:

out = claimss[["alzheimers","diabetes","arthritis"]].eq(1).any(axis=1).mean()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM