Pandas - 将总行添加到每个子组作为第一行

Question

我知道这个问题在 StackOverflow 上已经被多次提及，我觉得完成这个任务并不容易。 这个和许多其他答案：将总行添加到 pandas DataFrame groupby

我的数据样本（实际上有 25 列，但它们相似，只有数字）：

owner   player  val1    val1    val3
A       x       5.60    3.18    0.76
A       y       12.08   15.95   -0.24
A       z       0.03    0.05    -0.41
B       x       0.02    0.01    2.06
B       z       2.36    2.37    0.00
C       x       0.16    0.15    0.05
C       y       0.72    0.75    -0.04
D       x       0.33    0.56    -0.41

我的预期 output 如下，其中每个所有者的总数被计算并放置在子组中的第一行。

owner   player  val1    val1    val3
A      total    17.71   19.18   0.11
A      x        5.60    3.18    0.76
A      y        12.08   15.95   -0.24
A      z        0.03    0.05    -0.41
B      total    2.38    2.38    2.05
B      x        0.02    0.01    2.06
B      z        2.36    2.37    0.00
C      total    0.88    0.90    0.01
C      x        0.16    0.15    0.05
C      y        0.72    0.75    -0.04
D      total    0.33    0.56    -0.41
D      x        0.33    0.56    -0.41

我尝试使用我在 StackOverflow 上也找到的东西，它看起来像我正在寻找的东西，但我无法让它完全正确。

def lambda_t(x):
    df = x.sort_values(['owner']).drop(['owner'],axis=1)
    df.loc['total'] = df.sum()
    return df

df.groupby(['owner']).apply(lambda_t)

虽然理论上这可能很有趣，但总数并没有放在我想要的位置，而且最重要的是玩家姓名的值是连接的，所以我最终得到了一个非常紧凑的列。 这样我最终得到了一个多索引。

owner       player  val1    val1    val3
A   0        x      5.60    3.18    0.76
    1        y      12.08   15.95   -0.24
    2        z      0.03    0.05    -0.41
    total    xzy    17.71   19.18   0.11
.....

显然，降低多索引的级别会有所帮助，但我这样错过了总数，它消失了。

df.groupby(['owner']).apply(lambda_t).droplevel(level=1)

owner       player  val1    val1    val3
A            x      5.60    3.18    0.76
A            y      12.08   15.95   -0.24
A            z      0.03    0.05    -0.41
A            xzy    17.71   19.18   0.11

如果可能的话，有什么想法吗？ 我已经看到使用 groupby、assign 和 loc 您无法正确订购它们。

Answer 1

IIUC，您可以使用groupby.sum来计算总数， assign总名称指定为 player，将两个concat按顺序连接，并使用稳定的方法sort_values ：

out = (pd
  .concat([df.groupby('owner', as_index=False).sum().assign(player='total'),
           df])
   .sort_values(by='owner', kind='stable', ignore_index=True)
   [df.columns]
)

output：

   owner player   val1    val1  val3
0      A  total  17.71   19.18  0.11
1      A      x   5.60    3.18  0.76
2      A      y  12.08   15.95 -0.24
3      A      z   0.03    0.05 -0.41
4      B  total   2.38    2.38  2.06
5      B      x   0.02    0.01  2.06
6      B      z   2.36    2.37  0.00
7      C  total   0.88    0.90  0.01
8      C      x   0.16    0.15  0.05
9      C      y   0.72    0.75 -0.04
10     D  total   0.33    0.56 -0.41
11     D      x   0.33    0.56 -0.41

Answer 2

另一种可能的解决方案：

(df.groupby('owner')
 .apply(lambda x:
 pd.concat(
     [pd.concat([pd.DataFrame({'owner': x.owner.unique(), 'player': ['total']}),
      pd.DataFrame(x.iloc[:, 2:].apply(sum, axis=0)).T], axis=1),
      x]
 ))).reset_index(drop=True)

Output：

   owner player   val1   val2  val3
0      A  total  17.71  19.18  0.11
1      A      x   5.60   3.18  0.76
2      A      y  12.08  15.95 -0.24
3      A      z   0.03   0.05 -0.41
4      B  total   2.38   2.38  2.06
5      B      x   0.02   0.01  2.06
6      B      z   2.36   2.37  0.00
7      C  total   0.88   0.90  0.01
8      C      x   0.16   0.15  0.05
9      C      y   0.72   0.75 -0.04
10     D  total   0.33   0.56 -0.41
11     D      x   0.33   0.56 -0.41

Pandas - 将总行添加到每个子组作为第一行

问题描述

2 个解决方案

解决方案1
5 已采纳 2022-09-19 09:00:44

解决方案2
0 2022-09-19 10:38:53

Pandas - 将总行添加到每个子组作为第一行

问题描述

2 个解决方案

解决方案1 5 已采纳 2022-09-19 09:00:44

解决方案2 0 2022-09-19 10:38:53

解决方案1
5 已采纳 2022-09-19 09:00:44

解决方案2
0 2022-09-19 10:38:53