Suppose we have a Pandas DataFrame like the following:
df=pd.DataFrame({'name':['Ind','Chn','SG','US','SG','US','Ind','Chn','Fra','Fra'],'a':[5,6,3,4,7,12,66,78,65,100]})
I would like to sum the values of column 'a' for each distinct values of column 'name'.
I tried this code:
for i in df['name'].unique(): df['tot']=df[(df.name==i)]['a'].sum()
In the resulting new column, 'tot' Column contains only the sum of last distinct value of 'name' ie (only 'Fra') for all rows rather than separate values for each of [Ind, US,Fra ,etc] . I would like to have one cell in the new column (tot) for each unique value of 'name' column and ultimately want to sort the whole dateframe 'df' through sum of each unique values.
I tried using dictionary,
dc={}
for i in df['name'].unique():
dc[i]=dc.get(i,0)+(df[(df.name==i)]['a'].sum())
I get the desired result though in dictionary,so I don't know how to sort df from here based on values of the dictionary 'dc'.
{'Ind': 71, 'Chn': 84, 'SG': 10, 'US': 16, 'Fra': 165}
Could anybody please explain the process to workout such scenario in as many ways as possible? Which would be the most efficient way when dealing with huge data? Thanks!
Edit: My expected output is just to sort the dataframe df by the value of the new column 'tot'.. Or like finding the rows associated with maximum or minimum values in the column 'tot'.
You are looking for groupby
df=pd.DataFrame({'name':['Ind','Chn','SG','US','SG','US','Ind','Chn','Fra','Fra'],'a':[5,6,3,4,7,12,66,78,65,100]})
df.groupby('name').a.sum()
Out[950]:
name
Chn 84
Fra 165
Ind 71
SG 10
US 16
Name: a, dtype: int64
Edit :
df.assign(total=df.name.map(df.groupby('name').a.sum())).sort_values(['name','total'])
Out[964]:
a name total
1 6 Chn 84
7 78 Chn 84
8 65 Fra 165
9 100 Fra 165
0 5 Ind 71
6 66 Ind 71
2 3 SG 10
4 7 SG 10
3 4 US 16
EDIT 2 :
df.groupby('name').a.sum().sort_values(ascending=True)
Out[1111]:
name
SG 10
US 16
Ind 71
Chn 84
Fra 165
Name: a, dtype: int64
df.groupby('name').a.sum().sort_values(ascending=False)
Out[1112]:
name
Fra 165
Chn 84
Ind 71
US 16
SG 10
Name: a, dtype: int64
(df.groupby('name').a.sum().sort_values(ascending=False)).index.values
Out[1119]: array(['Fra', 'Chn', 'Ind', 'US', 'SG'], dtype=object)
IIUIC, use groupby
and transform
In [3716]: df['total'] = df.groupby('name')['a'].transform('sum')
In [3717]: df
Out[3717]:
a name total
0 5 Ind 71
1 6 Chn 84
2 3 SG 10
3 4 US 16
4 7 SG 10
5 12 US 16
6 66 Ind 71
7 78 Chn 84
8 65 Fra 165
9 100 Fra 165
And, use sort_values
In [3719]: df.sort_values(by='total', ascending=False)
Out[3719]:
a name total
8 65 Fra 165
9 100 Fra 165
1 6 Chn 84
7 78 Chn 84
0 5 Ind 71
6 66 Ind 71
3 4 US 16
5 12 US 16
2 3 SG 10
4 7 SG 10
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.