简体   繁体   中英

Python Pandas Group by the same value and replace with the mean

I have a dataframe in Python Pandas with only two columns. The first one has repeated values like the following:

   A    B
   apple   0.5
   apple   0.8
   apple   1.4
   orange   0.4
   orange   1.1
   melon   0.3
   melon   0.1
   melon   0.9
   melon   1.2

What I want to do is to create a new dataframe with the mean of each value in the first dataframe. For example:

   A   B
   apple   0.9
   orange   0.75
   melon   0.625

The file has about 2.5m rows and I cannot do it in Excel. Any ideas how can this be done in Pandas?

Let df be your dataframe, you can just groupby by 'A' and get the mean with:

g = df.groupby('A').mean()

This returns:

       B
A       
1  0.900
2  0.750
3  0.625

EDIT: if you're not familiar with pandas and you've got an external file, you can import it with:

df = pandas.read_csv(yourfile)

EDIT2:

g = df.groupby('A').mean()

works also with your edited dataframe of fruits:

            B
A            
apple   0.900
melon   0.625
orange  0.750

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM