Count number of unique occurrences in a dataframe

Question

I would like to divide the number of sales by the number of sales opportunities in order to get average sales by opportunities.

Here is an example dataframe with mixed types:

df = pd.DataFrame({'Opportunity':['AB122','AB122','AB123', 'AB124'],
           'Quantity': [2, 3, 4, 1],
           'Member': ["AACC", "AACC", "AACC", 'DDEE']})


print (df)
  Opportunity  Quantity Member
0       AB122         2   AACC
1       AB122         3   AACC
2       AB123         4   AACC
3       AB124         1   DDEE

I was able to get the sum of the sales with this one

df.pivot_table('Quantity', 'Member', aggfunc=np.sum)

But if I do the same for the Opportunity, I only get the Opportunity names glued together. Also, the duplicate opportunities are still included.

df.pivot_table('Opportunity','Member', aggfunc=np.sum)

What I need instead is that the opportunities are counted, but only the unique ones (AACC should only have 2 opportunities). The outcome of the counting should be:

print (df2)
AACC 2
DDEE 1

So then I could get the average member sales by dividing the sales quantity by number of opportunities:

print (df3)
AACC 4.5 
DDEE 1

Note on the calculation. AACC gets 2 as 9 divided by 2 is 4.5, DDEE gets 1 as 1 divided by 1 is 1.

Answer 1

df.groupby('Member').apply(lambda x: x.Quantity.sum())

which will group the df by Member column and then sum the Quantity per each group for example :

  Member Opportunity  Quantity
0   AACC       AB122         1
1   AACC       AB122         3
2   DDDD       AB123         4
3   AACC       AB124         1

will produce : Member AACC 5 DDDD 4 dtype: int64

Answer 2

You can use groupby.apply here to get your average sale, so we don't have to do groupby twice:

df.groupby('Member').apply(lambda x: x['Quantity'].sum() / x['Opportunity'].nunique())

Member
AACC    4.5
DDEE    1.0
dtype: float64

To get the column name use reset_index :

df.groupby('Member').apply(lambda x: x['Quantity'].sum() / x['Opportunity'].nunique())\
    .reset_index(name='avg sale')

  Member  avg sale
0   AACC       4.5
1   DDEE       1.0

Count number of unique occurrences in a dataframe

Question

2 answers

solution1
1 2019-07-01 12:54:47

solution2
0 ACCPTED 2019-07-01 12:54:32

Count number of unique occurrences in a dataframe

Question

2 answers

solution1 1 2019-07-01 12:54:47

solution2 0 ACCPTED 2019-07-01 12:54:32

solution1
1 2019-07-01 12:54:47

solution2
0 ACCPTED 2019-07-01 12:54:32