I know this questions has been mentioned various times on StackOverflow, I do not find it trivial to accomplish this task. This and many other answers: Adding total row to pandas DataFrame groupby
Sample of my data (there are actually 25 columns for this but they are similar, only numerical):
owner player val1 val1 val3
A x 5.60 3.18 0.76
A y 12.08 15.95 -0.24
A z 0.03 0.05 -0.41
B x 0.02 0.01 2.06
B z 2.36 2.37 0.00
C x 0.16 0.15 0.05
C y 0.72 0.75 -0.04
D x 0.33 0.56 -0.41
My intended output is as follows, where for each owner the total is calculated and placed as the first row in the subgroup.
owner player val1 val1 val3
A total 17.71 19.18 0.11
A x 5.60 3.18 0.76
A y 12.08 15.95 -0.24
A z 0.03 0.05 -0.41
B total 2.38 2.38 2.05
B x 0.02 0.01 2.06
B z 2.36 2.37 0.00
C total 0.88 0.90 0.01
C x 0.16 0.15 0.05
C y 0.72 0.75 -0.04
D total 0.33 0.56 -0.41
D x 0.33 0.56 -0.41
I attempted to use something that I also found on StackOverflow which looked like what I was searching for but I couldn't make it quite right.
def lambda_t(x):
df = x.sort_values(['owner']).drop(['owner'],axis=1)
df.loc['total'] = df.sum()
return df
df.groupby(['owner']).apply(lambda_t)
While in theory this might have been something interesting, the total is not placed where I want and on top of that the values on player name are concatenating, so I end up having a really packed column. This way I end up having a multiindex.
owner player val1 val1 val3
A 0 x 5.60 3.18 0.76
1 y 12.08 15.95 -0.24
2 z 0.03 0.05 -0.41
total xzy 17.71 19.18 0.11
.....
Apparently, dropping level of a multiindex can help but I'm missing the total this way, it disappears.
df.groupby(['owner']).apply(lambda_t).droplevel(level=1)
owner player val1 val1 val3
A x 5.60 3.18 0.76
A y 12.08 15.95 -0.24
A z 0.03 0.05 -0.41
A xzy 17.71 19.18 0.11
Any ideas if this is possible? I've seen that with groupby, assign and loc you can't order them properly.
IIUC, you can use groupby.sum
to compute the totals, assign
the total name as player,concat
the two DataFrames in order and sort_values
with a stable method:
out = (pd
.concat([df.groupby('owner', as_index=False).sum().assign(player='total'),
df])
.sort_values(by='owner', kind='stable', ignore_index=True)
[df.columns]
)
output:
owner player val1 val1 val3
0 A total 17.71 19.18 0.11
1 A x 5.60 3.18 0.76
2 A y 12.08 15.95 -0.24
3 A z 0.03 0.05 -0.41
4 B total 2.38 2.38 2.06
5 B x 0.02 0.01 2.06
6 B z 2.36 2.37 0.00
7 C total 0.88 0.90 0.01
8 C x 0.16 0.15 0.05
9 C y 0.72 0.75 -0.04
10 D total 0.33 0.56 -0.41
11 D x 0.33 0.56 -0.41
Another possible solution:
(df.groupby('owner')
.apply(lambda x:
pd.concat(
[pd.concat([pd.DataFrame({'owner': x.owner.unique(), 'player': ['total']}),
pd.DataFrame(x.iloc[:, 2:].apply(sum, axis=0)).T], axis=1),
x]
))).reset_index(drop=True)
Output:
owner player val1 val2 val3
0 A total 17.71 19.18 0.11
1 A x 5.60 3.18 0.76
2 A y 12.08 15.95 -0.24
3 A z 0.03 0.05 -0.41
4 B total 2.38 2.38 2.06
5 B x 0.02 0.01 2.06
6 B z 2.36 2.37 0.00
7 C total 0.88 0.90 0.01
8 C x 0.16 0.15 0.05
9 C y 0.72 0.75 -0.04
10 D total 0.33 0.56 -0.41
11 D x 0.33 0.56 -0.41
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.