简体   繁体   中英

How to sum only certain elements of a column depending on value of other column in Pandas DataFrame?

Suppose we have a Pandas DataFrame like the following:

df=pd.DataFrame({'name':['Ind','Chn','SG','US','SG','US','Ind','Chn','Fra','Fra'],'a':[5,6,3,4,7,12,66,78,65,100]})

I would like to sum the values of column 'a' for each distinct values of column 'name'.

I tried this code:

for i in df['name'].unique(): df['tot']=df[(df.name==i)]['a'].sum()

In the resulting new column, 'tot' Column contains only the sum of last distinct value of 'name' ie (only 'Fra') for all rows rather than separate values for each of [Ind, US,Fra ,etc] . I would like to have one cell in the new column (tot) for each unique value of 'name' column and ultimately want to sort the whole dateframe 'df' through sum of each unique values.

I tried using dictionary,

dc={}
for i in df['name'].unique():
   dc[i]=dc.get(i,0)+(df[(df.name==i)]['a'].sum())

I get the desired result though in dictionary,so I don't know how to sort df from here based on values of the dictionary 'dc'.

{'Ind': 71, 'Chn': 84, 'SG': 10, 'US': 16, 'Fra': 165}

Could anybody please explain the process to workout such scenario in as many ways as possible? Which would be the most efficient way when dealing with huge data? Thanks!

Edit: My expected output is just to sort the dataframe df by the value of the new column 'tot'.. Or like finding the rows associated with maximum or minimum values in the column 'tot'.

You are looking for groupby

df=pd.DataFrame({'name':['Ind','Chn','SG','US','SG','US','Ind','Chn','Fra','Fra'],'a':[5,6,3,4,7,12,66,78,65,100]})
df.groupby('name').a.sum()

Out[950]: 
name
Chn     84
Fra    165
Ind     71
SG      10
US      16
Name: a, dtype: int64

Edit :

df.assign(total=df.name.map(df.groupby('name').a.sum())).sort_values(['name','total'])


Out[964]: 
     a name  total
1    6  Chn     84
7   78  Chn     84
8   65  Fra    165
9  100  Fra    165
0    5  Ind     71
6   66  Ind     71
2    3   SG     10
4    7   SG     10
3    4   US     16

EDIT 2 :

df.groupby('name').a.sum().sort_values(ascending=True)
Out[1111]: 
name
SG      10
US      16
Ind     71
Chn     84
Fra    165
Name: a, dtype: int64
df.groupby('name').a.sum().sort_values(ascending=False)
Out[1112]: 
name
Fra    165
Chn     84
Ind     71
US      16
SG      10
Name: a, dtype: int64

(df.groupby('name').a.sum().sort_values(ascending=False)).index.values
Out[1119]: array(['Fra', 'Chn', 'Ind', 'US', 'SG'], dtype=object)

IIUIC, use groupby and transform

In [3716]: df['total'] = df.groupby('name')['a'].transform('sum')

In [3717]: df
Out[3717]:
     a name  total
0    5  Ind     71
1    6  Chn     84
2    3   SG     10
3    4   US     16
4    7   SG     10
5   12   US     16
6   66  Ind     71
7   78  Chn     84
8   65  Fra    165
9  100  Fra    165

And, use sort_values

In [3719]: df.sort_values(by='total', ascending=False)
Out[3719]:
     a name  total
8   65  Fra    165
9  100  Fra    165
1    6  Chn     84
7   78  Chn     84
0    5  Ind     71
6   66  Ind     71
3    4   US     16
5   12   US     16
2    3   SG     10
4    7   SG     10

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM