Pandas calculate common occurrence percentage of a group by

Question

I would think this is a simple task, but have had a difficult time getting it working. I have a pandas dataframe

IDX     Name    Fruit
 0      Mary    Grapes
 1      Tom     Apple
 2      Mary    Grapes
 3      Mary    Apple
 4      Tom     Apple
 5      Mary    Grapes
 6      Tom     Grapes
 7      Tom     Apple

My expected result

IDX     Name    Occurrence     Perc
 0      Mary    Grapes         75%
 1      Tom     Apple          75%

So my end result would be a data frame grouped by name, with the most common occurrence for fruit and the percentage of that occurrence

So to get the most common occurrence i use

df["Occurrence"] = (df.groupby("Name")["Fruit"].transform(lambda x: x.value_counts().index[0]))

How would i go about populating the perc column?

Answer 1

Let us do crosstab then sort_values with tail

out = pd.crosstab(df.Name,df.Fruit,normalize='index').stack().sort_values().groupby(level=0).tail(1)
Out[65]: 
Name  Fruit 
Mary  Grapes    0.75
Tom   Apple     0.75
dtype: float64

Answer 2

Use Series.value_counts with normalize=True in lambda function and return first row by DataFrame.head , last convert Multiindex Series to DataFrame :

df1 = (df.groupby("Name")["Fruit"]
         .apply(lambda x: x.value_counts(normalize=True).head(1))
         .mul(100)
         .rename_axis(['Name','Occurrence'])
         .reset_index(name='Perc'))
print (df1)
   Name Occurrence  Perc
0  Mary     Grapes  75.0
1   Tom      Apple  75.0

Answer 3

Here is another way.

df['pcts'] = df.groupby(['name','fruit'])['fruit'].transform('size').div(df['fruit'].groupby(df['name']).transform('size'))

df.loc[df.groupby('name')['pcts'].idxmax()]

Pandas calculate common occurrence percentage of a group by

Question

3 answers

solution1
1 2020-11-13 14:07:53

solution2
1 ACCPTED 2020-11-13 14:08:31

solution3
1 2020-11-13 14:12:24

Pandas calculate common occurrence percentage of a group by

Question

3 answers

solution1 1 2020-11-13 14:07:53

solution2 1 ACCPTED 2020-11-13 14:08:31

solution3 1 2020-11-13 14:12:24

solution1
1 2020-11-13 14:07:53

solution2
1 ACCPTED 2020-11-13 14:08:31

solution3
1 2020-11-13 14:12:24