I would like to divide the number of sales by the number of sales opportunities in order to get average sales by opportunities.
Here is an example dataframe with mixed types:
df = pd.DataFrame({'Opportunity':['AB122','AB122','AB123', 'AB124'],
'Quantity': [2, 3, 4, 1],
'Member': ["AACC", "AACC", "AACC", 'DDEE']})
print (df)
Opportunity Quantity Member
0 AB122 2 AACC
1 AB122 3 AACC
2 AB123 4 AACC
3 AB124 1 DDEE
I was able to get the sum of the sales with this one
df.pivot_table('Quantity', 'Member', aggfunc=np.sum)
But if I do the same for the Opportunity, I only get the Opportunity names glued together. Also, the duplicate opportunities are still included.
df.pivot_table('Opportunity','Member', aggfunc=np.sum)
What I need instead is that the opportunities are counted, but only the unique ones (AACC should only have 2 opportunities). The outcome of the counting should be:
print (df2)
AACC 2
DDEE 1
So then I could get the average member sales by dividing the sales quantity by number of opportunities:
print (df3)
AACC 4.5
DDEE 1
Note on the calculation. AACC gets 2 as 9 divided by 2 is 4.5, DDEE gets 1 as 1 divided by 1 is 1.
df.groupby('Member').apply(lambda x: x.Quantity.sum())
which will group the df by Member column and then sum the Quantity per each group for example :
Member Opportunity Quantity
0 AACC AB122 1
1 AACC AB122 3
2 DDDD AB123 4
3 AACC AB124 1
will produce : Member AACC 5 DDDD 4 dtype: int64
You can use groupby.apply
here to get your average sale, so we don't have to do groupby twice:
df.groupby('Member').apply(lambda x: x['Quantity'].sum() / x['Opportunity'].nunique())
Member
AACC 4.5
DDEE 1.0
dtype: float64
To get the column name use reset_index
:
df.groupby('Member').apply(lambda x: x['Quantity'].sum() / x['Opportunity'].nunique())\
.reset_index(name='avg sale')
Member avg sale
0 AACC 4.5
1 DDEE 1.0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.