I want to calculate the scoring rate of each zone by using bygroup in pandas, but not sure how to do it:
Suppose the df has two columns as:
Shot_type Shot_zone
Goal Penalty_area
Saved Penalty_area
Goal Goal Box
Saved Goal Box
Here I want to groupy by Shot_zone, and calculate the scoring rate based on Shot_type's Goal counts / len() of each type Shot_zone. Here each Shot_zone has 1 goal and 1 saved, so the result should be like:
Penalty_area 50%
Goal Box 50%
Is there any understandable approach to do so using Pandas? Thank you very much!
Using
pd.crosstab(df.Shot_type,df.Shot_zone,normalize='index')
Out[662]:
Shot_zone GoalBox Penalty_area
Shot_type
Goal 0.5 0.5
Saved 0.5 0.5
One way is to binarize your Shot_type
column, ie set to True
if it equals 'Goal'
, and then use GroupBy
+ mean
:
res = df.assign(Shot_type=df['Shot_type']=='Goal')\
.groupby('Shot_zone')['Shot_type'].mean()
print(res)
Shot_zone
GoalBox 0.5
Penalty_area 0.5
Name: Shot_type, dtype: float64
Can also groupby
and apply
df.groupby('Shot_zone').Shot_type.apply(lambda s: '{}%'.format((s[s=='Goal']).size/(s.size) * 100))
Shot_zone
Goal_Box 50.0%
Penalty_area 50.0%
You can do the same using following:
data[data['Shot_type']=='Goal'].groupby(['Shot_zone'])['Shot_zone'].count()
/data.groupby(['Shot_zone'])['Shot_zone'].count())
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.