简体   繁体   中英

How to count the values of a pandas groupby that meet a certain condition

Before I start, I want to make it clear that my question is different than Counting values that meet a condition and Want to count the number of values in a column that meet a condition .

Allow me to explain. Here is my df:

test = pd.DataFrame({'name':['joe','dan','betty','joe','dan','betty','joe','dan','betty','joe','dan','betty'],'points':[12,3,5,10,5,16,2,8,15,17,1,3]})
test


   name points
0   joe   12
1   dan    3
2   betty  5
3   joe   10
4   dan    5
5   betty 16
6   joe    2
7   dan    8
8   betty 15
9   joe   17
10  dan    1
11  betty  3

What I am aiming to do is count how many times each person had less than 10 points and create a new column that consists of that value. I tried the following and got really close:

test['<10'] = test[test['points'] < 10].groupby('name')['points'].transform('count')
test


    name    points  <10
0   joe       12    NaN
1   dan        3    4.0
2   betty      5    2.0
3   joe       10    NaN
4   dan        5    4.0
5   betty     16    NaN
6   joe        2    1.0
7   dan        8    4.0
8   betty     15    NaN
9   joe       17    NaN
10  dan        1    4.0
11  betty      3    2.0

I get the values I want, but since I subset the data frame to values <10, I am left with NaN's in the rows that were excluded. I almost have this figured out, but I would like to get those NaN values filled to show how many times each person had less than 10 points (ie joe should have 1, betty 2, and dan 4). Any help is appreciated, thanks!

Your code picks only rows where point < 10 . The logic is you need to sum directly on the mask of point < 10 . True is equivalent to 1 and False is 0 . So, sum on point < 10 will return the count of True . From this logic, you only need to groupby and do transform to populate the count of True to each group

test['<10'] = (test['points'] < 10).groupby(test['name']).transform('sum')

Out[84]:
     name  points  <10
0     joe      12  1.0
1     dan       3  4.0
2   betty       5  2.0
3     joe      10  1.0
4     dan       5  4.0
5   betty      16  2.0
6     joe       2  1.0
7     dan       8  4.0
8   betty      15  2.0
9     joe      17  1.0
10    dan       1  4.0
11  betty       3  2.0

Fix your code remove the transform and add reindex

test['<10']=test[test['points'] < 10].groupby('name')['points'].count().reindex(test.name).values
test
     name  points  <10
0     joe      12    1
1     dan       3    4
2   betty       5    2
3     joe      10    1
4     dan       5    4
5   betty      16    2
6     joe       2    1
7     dan       8    4
8   betty      15    2
9     joe      17    1
10    dan       1    4
11  betty       3    2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM