Pandas how to calculate bygroup result based on the length of the each group and a count value of another column

Question

I want to calculate the scoring rate of each zone by using bygroup in pandas, but not sure how to do it:

Suppose the df has two columns as:

Shot_type   Shot_zone
   Goal     Penalty_area
   Saved    Penalty_area
   Goal     Goal Box
   Saved    Goal Box

Here I want to groupy by Shot_zone, and calculate the scoring rate based on Shot_type's Goal counts / len() of each type Shot_zone. Here each Shot_zone has 1 goal and 1 saved, so the result should be like:

Penalty_area   50%
Goal Box       50%

Is there any understandable approach to do so using Pandas? Thank you very much!

Answer 1

Using

pd.crosstab(df.Shot_type,df.Shot_zone,normalize='index')
Out[662]: 
Shot_zone  GoalBox  Penalty_area
Shot_type                       
Goal           0.5           0.5
Saved          0.5           0.5

Answer 2

One way is to binarize your Shot_type column, ie set to True if it equals 'Goal' , and then use GroupBy + mean :

res = df.assign(Shot_type=df['Shot_type']=='Goal')\
        .groupby('Shot_zone')['Shot_type'].mean()

print(res)

Shot_zone
GoalBox         0.5
Penalty_area    0.5
Name: Shot_type, dtype: float64

Answer 3

Can also groupby and apply

df.groupby('Shot_zone').Shot_type.apply(lambda s: '{}%'.format((s[s=='Goal']).size/(s.size) * 100))

Shot_zone
Goal_Box        50.0%
Penalty_area    50.0%

Answer 4

You can do the same using following:

data[data['Shot_type']=='Goal'].groupby(['Shot_zone'])['Shot_zone'].count()
/data.groupby(['Shot_zone'])['Shot_zone'].count())

Pandas how to calculate bygroup result based on the length of the each group and a count value of another column

Question

4 answers

solution1
2 2018-08-01 15:29:08

solution2
1 2018-08-01 15:29:03

solution3
1 ACCPTED 2018-08-01 15:32:36

solution4
1 2018-08-01 18:18:43

Pandas how to calculate bygroup result based on the length of the each group and a count value of another column

Question

4 answers

solution1 2 2018-08-01 15:29:08

solution2 1 2018-08-01 15:29:03

solution3 1 ACCPTED 2018-08-01 15:32:36

solution4 1 2018-08-01 18:18:43

solution1
2 2018-08-01 15:29:08

solution2
1 2018-08-01 15:29:03

solution3
1 ACCPTED 2018-08-01 15:32:36

solution4
1 2018-08-01 18:18:43