[英]How to append a "Total" row to pandas dataframe with MultiIndex
Suppose you have a simple pandas dataframe with a MultiIndex:假设您有一个带有 MultiIndex 的简单 pandas dataframe:
df = pd.DataFrame(1, index=pd.MultiIndex.from_tuples([('one', 'elem1'), ('one', 'elem2'), ('two', 'elem1'), ('two', 'elem2')]),
columns=['col1', 'col2'])
Printed as a table:打印为表格:
col1 col2
one elem1 1 1
elem2 1 1
two elem1 1 1
elem2 1 1
Question : How do you add a "Total" row to that Dataframe?问题:如何在 Dataframe 中添加“总计”行?
Expected output:预期 output:
col1 col2
one elem1 1.0 1.0
elem2 1.0 1.0
two elem1 1.0 1.0
elem2 1.0 1.0
Total 4.0 4.0
If I am just ignoring the MultiIndex and follow the standard way如果我只是忽略 MultiIndex 并遵循标准方式
df.loc['Total'] = df.sum()
Output: Output:
col1 col2
(one, elem1) 1 1
(one, elem2) 1 1
(two, elem1) 1 1
(two, elem2) 1 1
Total 4 4
It seems to be correct, but the MultiIndex is transformed to Index([('one', 'elem1'), ('one', 'elem2'), ('two', 'elem1'), ('two', 'elem2'), 'Total'], dtype='object')
这似乎是正确的,但是 MultiIndex 被转换为
Index([('one', 'elem1'), ('one', 'elem2'), ('two', 'elem1'), ('two', 'elem2'), 'Total'], dtype='object')
df.loc['Total', :] = df.sum()
or (being frustrated and changing the axis just out of spite)或(感到沮丧并出于恶意改变轴)
df.loc['Total', :] = df.sum(axis=1)
Output (the same for both calls): Output(两个调用相同):
col1 col2
one elem1 1.0 1.0
elem2 1.0 1.0
two elem1 1.0 1.0
elem2 1.0 1.0
Total NaN NaN
The MultiIndex is not transformed, but the Total is wrong (NaN.= 4). MultiIndex 未转换,但 Total 错误 (NaN.= 4)。
You have to remove the index of df.sum()
and just use the values:您必须删除
df.sum()
的索引并仅使用以下值:
df.loc['Total', :] = df.sum().values
Output: Output:
col1 col2
one elem1 1.0 1.0
elem2 1.0 1.0
two elem1 1.0 1.0
elem2 1.0 1.0
Total 4.0 4.0
The second attempt was almost correct.第二次尝试几乎是正确的。 But df.sum() has the
Index(['col1', 'col2'], dtype='object')
.但是 df.sum() 有
Index(['col1', 'col2'], dtype='object')
。 Consequently, pandas isn't able to match the index.因此,pandas 无法匹配索引。 The new index ('Total', '') is appended but without values.
新索引 ('Total', '') 已附加但没有值。
But why did df.loc['Total', :] = df.sum(axis=1)
also fail?但是为什么
df.loc['Total', :] = df.sum(axis=1)
也失败了? It has the correct Multiindex.它具有正确的多索引。 Pandas does exactly what you told it, ie sum the columns.
Pandas 完全按照您所说的进行,即对列求和。 So,
df.sum(axis=1)
gives you the following dataframe:因此,
df.sum(axis=1)
为您提供以下 dataframe:
one elem1 2
elem2 2
two elem1 2
elem2 2
This dataframe can't be matched with the original df
in any meaningful sense.这个 dataframe 在任何有意义的意义上都无法与原始
df
匹配。
Building on @above_c_level's accepted answer, here is a function:基于@above_c_level 接受的答案,这里有一个 function:
def with_totals(df: pd.DataFrame) -> pd.DataFrame:
'''Return new df with row & col totals named ∑.
* Preserves dtypes.
* Handles multi-index.
'''
df['∑'] = df.sum(axis=1) # Row totals in new column
D = df.dtypes
df.loc['∑', :] = df.sum().values
return df.astype(D)
Usage:用法:
with_totals(df)
Output: Output:
col1 col2 ∑
one elem1 1 1 2
elem2 1 1 2
two elem1 1 1 2
elem2 1 1 2
∑ 4 4 8
The key move remains the twin use of .loc
and .values
:关键举措仍然是
.loc
和.values
的双重使用:
df.loc['Total', :] = df.sum().values
But dtypes can get lost .但是 dtypes 可能会丢失。 In my case my ints were converted to objects, which rendered as floats, making it hard to read.
在我的情况下,我的整数被转换为对象,这些对象呈现为浮点数,使其难以阅读。 The best answer I found was to remember and re-apply the dtypes.
我找到的最佳答案是记住并重新应用 dtypes。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.