简体   繁体   中英

Can you filter a pandas dataframe based on a sum or count or multiple variables?

I'm trying to filter a Pandas dataframe based on a set of or conditions, but they're all very similar, and I'm wondering if there's a more efficient way to write this.

Specifically, I want to include rows from the dataframe (df) where any of a set of variables is 1:

df.query("Q50r5==1 or Q50r6==1 or Q50r7==1 or Q50r8==1 or Q50r9==1 or Q50r10==1 or Q50r11==1")

This filters correctly to rows where any of these variables is 1.

However, I expect to have a lot more situations where I need to filter my dataframe to something similar, eg:

df.query("Q20r1==1 or Q20r2==1 or Q20r3==1")
df.query("Q23r2==1 or Q23r5==1 or Q23r7==1 or Q23r8==1")

The pandas documentation on .query() doesn't specify any more than that you can use and and or like you can elsewhere in Python, so it's possible this is the only way to do this query, but is there some kind of sum or count I could do across these columns within the query? Something like "any(1,Q20r1,Q20r2,Q20r3)" or "sum(Q20r1,Q20r2,Q20r3)>0"?

EDIT: For example, using this small dataframe: 示例数据框

I would want to retrieve ID #s 1,2,4,5,7 and exclude #s 3 and 6, because 3 and 6 do not have any 1's across the columns I'm referring to.

You can use any with axis = 1 to check that at least one value is True in a row.

For example, you can run

df[(df[["Q20r1", "Q20r2", "Q20r3"]] == 1).any(axis = 1)]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM