简体   繁体   English

如何使用 MultiIndex append 将“总计”行添加到 pandas dataframe

[英]How to append a "Total" row to pandas dataframe with MultiIndex

Suppose you have a simple pandas dataframe with a MultiIndex:假设您有一个带有 MultiIndex 的简单 pandas dataframe:

df = pd.DataFrame(1, index=pd.MultiIndex.from_tuples([('one', 'elem1'), ('one', 'elem2'), ('two', 'elem1'), ('two', 'elem2')]),
                  columns=['col1', 'col2'])

Printed as a table:打印为表格:

           col1  col2
one elem1     1     1
    elem2     1     1
two elem1     1     1
    elem2     1     1

Question : How do you add a "Total" row to that Dataframe?问题:如何在 Dataframe 中添加“总计”行?

Expected output:预期 output:

             col1  col2
one   elem1   1.0   1.0
      elem2   1.0   1.0
two   elem1   1.0   1.0
      elem2   1.0   1.0
Total         4.0   4.0

First attempt: Naive implementation第一次尝试:朴素的实现

If I am just ignoring the MultiIndex and follow the standard way如果我只是忽略 MultiIndex 并遵循标准方式

df.loc['Total'] = df.sum()

Output: Output:

              col1  col2
(one, elem1)     1     1
(one, elem2)     1     1
(two, elem1)     1     1
(two, elem2)     1     1
Total            4     4

It seems to be correct, but the MultiIndex is transformed to Index([('one', 'elem1'), ('one', 'elem2'), ('two', 'elem1'), ('two', 'elem2'), 'Total'], dtype='object')这似乎是正确的,但是 MultiIndex 被转换为Index([('one', 'elem1'), ('one', 'elem2'), ('two', 'elem1'), ('two', 'elem2'), 'Total'], dtype='object')


Second attempt: Be explicit第二次尝试:明确

df.loc['Total', :] = df.sum()

or (being frustrated and changing the axis just out of spite)或(感到沮丧并出于恶意改变轴)

df.loc['Total', :] = df.sum(axis=1)

Output (the same for both calls): Output(两个调用相同):

             col1  col2
one   elem1   1.0   1.0
      elem2   1.0   1.0
two   elem1   1.0   1.0
      elem2   1.0   1.0
Total         NaN   NaN

The MultiIndex is not transformed, but the Total is wrong (NaN.= 4). MultiIndex 未转换,但 Total 错误 (NaN.= 4)。

The solution解决方案

You have to remove the index of df.sum() and just use the values:您必须删除df.sum()的索引并仅使用以下值:

df.loc['Total', :] = df.sum().values

Output: Output:

             col1  col2
one   elem1   1.0   1.0
      elem2   1.0   1.0
two   elem1   1.0   1.0
      elem2   1.0   1.0
Total         4.0   4.0

Why was the second attempt wrong?为什么第二次尝试错了?

The second attempt was almost correct.第二次尝试几乎是正确的。 But df.sum() has the Index(['col1', 'col2'], dtype='object') .但是 df.sum() 有Index(['col1', 'col2'], dtype='object') Consequently, pandas isn't able to match the index.因此,pandas 无法匹配索引。 The new index ('Total', '') is appended but without values.新索引 ('Total', '') 已附加但没有值。

But why did df.loc['Total', :] = df.sum(axis=1) also fail?但是为什么df.loc['Total', :] = df.sum(axis=1)也失败了? It has the correct Multiindex.它具有正确的多索引。 Pandas does exactly what you told it, ie sum the columns. Pandas 完全按照您所说的进行,即对列求和。 So, df.sum(axis=1) gives you the following dataframe:因此, df.sum(axis=1)为您提供以下 dataframe:

one  elem1    2
     elem2    2
two  elem1    2
     elem2    2

This dataframe can't be matched with the original df in any meaningful sense.这个 dataframe 在任何有意义的意义上都无法与原始df匹配。

Building on @above_c_level's accepted answer, here is a function:基于@above_c_level 接受的答案,这里有一个 function:

  • Handles multi-index处理多索引
  • Also preserves dtypes还保留 dtypes
def with_totals(df: pd.DataFrame) -> pd.DataFrame:
     '''Return new df with row & col totals named ∑. 
        * Preserves dtypes. 
        * Handles multi-index.
    '''
     df['∑'] = df.sum(axis=1)  # Row totals in new column
     D = df.dtypes
     df.loc['∑', :] = df.sum().values 
     return df.astype(D)

Usage:用法:

with_totals(df)

Output: Output:

            col1 col2   ∑
one elem1      1    1   2
    elem2      1    1   2
two elem1      1    1   2
    elem2      1    1   2
∑              4    4   8

Discussion讨论

The key move remains the twin use of .loc and .values :关键举措仍然是.loc.values的双重使用:

df.loc['Total', :] = df.sum().values

But dtypes can get lost .但是 dtypes 可能会丢失 In my case my ints were converted to objects, which rendered as floats, making it hard to read.在我的情况下,我的整数被转换为对象,这些对象呈现为浮点数,使其难以阅读。 The best answer I found was to remember and re-apply the dtypes.我找到的最佳答案是记住并重新应用 dtypes。


声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM