简体   繁体   中英

python pandas groupby then count rows satisfying condition

i am trying to do a groupby on the id column such that i can show the number of rows in col1 that is equal to 1.

df:

id col1 col2 col3
a   1     1    1
a   0     1    1
a   1     1    1
b   1     0    1

my code:

df.groupby(['id'])[col1].count()[1]

output i got was 2. It didnt show me the values from other ids like b.

i want:

id col1
a   2    
b   1 

if possible can the total rows per id also be displayed as a new column?

example:

id col1 total
a   2    3
b   1    1

Assuming you have only 1 and 0 in col1, you can use agg :

df.groupby('id', as_index=False)['col1'].agg({'col1': 'sum', 'total': 'count'})

#  id   total   col1
#0  a       3      2
#1  b       1      1

It's because your rows which id is 'a' sums to 3. The 2 of them are identical that's why it's been grouped and considered as one then it added the unique row which contains the 0 value on its col 1. You can't group rows with different values on its rows.

Yes you can add it on your output. Just place a method how you counted all rows on your column section of your code.

If you want to generalize the solution to include values in col1 that are not zero you can do the following. This also orders the columns correctly.

df.set_index('id')['col1'].eq(1).groupby(level=0).agg([('col1', 'sum'), ('total', 'count')]).reset_index()

  id  col1  total
0  a   2.0      3
1  b   1.0      1

Using a tuple in the agg method where the first value is the column name and the second the aggregating function is new to me. I was just experimenting and it seemed to work. I don't remember seeing it in the documentation so use with caution.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM