简体   繁体   中英

Pandas how to calculate bygroup result based on the length of the each group and a count value of another column

I want to calculate the scoring rate of each zone by using bygroup in pandas, but not sure how to do it:

Suppose the df has two columns as:

Shot_type   Shot_zone
   Goal     Penalty_area
   Saved    Penalty_area
   Goal     Goal Box
   Saved    Goal Box

Here I want to groupy by Shot_zone, and calculate the scoring rate based on Shot_type's Goal counts / len() of each type Shot_zone. Here each Shot_zone has 1 goal and 1 saved, so the result should be like:

Penalty_area   50%
Goal Box       50%

Is there any understandable approach to do so using Pandas? Thank you very much!

Using

pd.crosstab(df.Shot_type,df.Shot_zone,normalize='index')
Out[662]: 
Shot_zone  GoalBox  Penalty_area
Shot_type                       
Goal           0.5           0.5
Saved          0.5           0.5

One way is to binarize your Shot_type column, ie set to True if it equals 'Goal' , and then use GroupBy + mean :

res = df.assign(Shot_type=df['Shot_type']=='Goal')\
        .groupby('Shot_zone')['Shot_type'].mean()

print(res)

Shot_zone
GoalBox         0.5
Penalty_area    0.5
Name: Shot_type, dtype: float64

Can also groupby and apply

df.groupby('Shot_zone').Shot_type.apply(lambda s: '{}%'.format((s[s=='Goal']).size/(s.size) * 100))

Shot_zone
Goal_Box        50.0%
Penalty_area    50.0%

You can do the same using following:

data[data['Shot_type']=='Goal'].groupby(['Shot_zone'])['Shot_zone'].count()
/data.groupby(['Shot_zone'])['Shot_zone'].count())

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM