Python Pandas Group by the same value and replace with the mean

Question

I have a dataframe in Python Pandas with only two columns. The first one has repeated values like the following:

   A    B
   apple   0.5
   apple   0.8
   apple   1.4
   orange   0.4
   orange   1.1
   melon   0.3
   melon   0.1
   melon   0.9
   melon   1.2

What I want to do is to create a new dataframe with the mean of each value in the first dataframe. For example:

   A   B
   apple   0.9
   orange   0.75
   melon   0.625

The file has about 2.5m rows and I cannot do it in Excel. Any ideas how can this be done in Pandas?

Answer 1

Let df be your dataframe, you can just groupby by 'A' and get the mean with:

g = df.groupby('A').mean()

This returns:

EDIT: if you're not familiar with pandas and you've got an external file, you can import it with:

df = pandas.read_csv(yourfile)

EDIT2:

g = df.groupby('A').mean()

works also with your edited dataframe of fruits:

            B
A            
apple   0.900
melon   0.625
orange  0.750