[英]Pandas adding sum of columns in pivot table (multiindexed)
I have df and df_pivot with below code: import pandas as pd import numpy as np我有带有以下代码的 df 和 df_pivot: import pandas as pd import numpy as np
df = pd.DataFrame({"A": ["foo", "foo", "foo", "foo", "foo",
"bar", "bar", "bar", "bar"],
"B": ["one", "one", "one", "two", "two",
"one", "one", "two", "two"],
"Year": [2019, 2019, 2019, 2019,
2019, 2019, 2020, 2020,
2020],
"Month": ["01", "02", "03", "04", "05", "06", "01", "02", "03"],
"Values": [2, 4, 5, 5, 6, 6, 8, 9, 9]})
df_pivot = pd.pivot_table(df, values='Values', index=['A', 'B'],
columns=['Year','Month'], aggfunc=np.sum, fill_value=0)
df_pivot looks like below: df_pivot 如下所示:
Year 2019 2020
Month 01 02 03 04 05 06 01 02 03
A B
bar one 0 0 0 0 0 6 8 0 0
two 0 0 0 0 0 0 0 9 9
foo one 2 4 5 0 0 0 0 0 0
two 0 0 0 5 6 0 0 0 0
now what I am trying to do is to add basically three columns into this df: 2019FY, 2019YTD, 2020YTD现在我要做的是在这个df中基本上添加三列:2019FY, 2019YTD, 2020YTD
2019FY column should be sum of all values under "2019" 2019FY 列应该是“2019”下所有值的总和
2019YTD column should be sum of all values under "2019" where period is defined, ie if period is defined as 04, 2019YTD should sum columns under 2019 for 01/02/03/04 2019YTD 列应该是定义了期间的“2019”下所有值的总和,即如果期间定义为 04,则 2019YTD 应该对 2019 年 01/02/03/04 下的列求和
2020YTD column should be sum of all values under "2020", 2020YTD 列应该是“2020”下所有值的总和,
Output table should look as below: Output 表应如下所示:
Year 2019 2019FY 2019YTD 2020 2020YTD
Month 01 02 03 04 05 06 01 02 03
A B
bar one 0 0 0 0 0 6 6 0 8 0 0 8
two 0 0 0 0 0 0 0 0 0 9 9 18
foo one 2 4 5 0 0 0 11 11 0 0 0 0
two 0 0 0 5 6 0 11 5 0 0 0 0
Essentially I would like to know how I can sum column with given "Month", as from here I can create 2019FY/2019YTD/2020YTD on my own, also it is important to add them in specific slot in the pivot table (at the end of 2019 data and at the end of 2020 data).本质上,我想知道如何将列与给定的“月份”相加,因为从这里我可以自己创建 2019FY/2019YTD/2020YTD,此外,将它们添加到 pivot 表的特定插槽中也很重要(最后2019 年数据和 2020 年底数据)。
Is it feasible?可行吗?
I was looking everywhere but could not find example how to do it.我到处找,但找不到如何做的例子。
Appreciate the help感谢帮助
Thanks Pawel谢谢帕维尔
For each year is possible create new columns in custom function, so in ouput is also 2020FY
columns in GroupBy.apply
:每年都可以在自定义 function 中创建新列,因此输出中的
GroupBy.apply
也是2020FY
列:
def f(x):
#get all months and convert to integers numbers
c = x.columns.get_level_values(1).astype(int)
#sum all values
s1 = x.sum(axis=1)
#sum 1,2,3,4 months
s2 = x.loc[:, c <= 4].sum(axis=1)
x[(f'{x.name}FY','')] = s1
x[(f'{x.name}YTD','')] = s2
return x
df = df_pivot.groupby(level=0, axis=1, group_keys=False).apply(f)
print (df)
Year 2019 2019FY 2019YTD 2020 2020FY 2020YTD
Month 01 02 03 04 05 06 01 02 03
A B
bar one 0 0 0 0 0 6 6 0 8 0 0 8 8
two 0 0 0 0 0 0 0 0 0 9 9 18 18
foo one 2 4 5 0 0 0 11 11 0 0 0 0 0
two 0 0 0 5 6 0 11 5 0 0 0 0 0
If need remove columns use tuple
s, because MultiIndex
:如果需要删除列,请使用
tuple
,因为MultiIndex
:
df = df.drop([('2020FY','')], axis=1)
print (df)
Year 2019 2019FY 2019YTD 2020 2020YTD
Month 01 02 03 04 05 06 01 02 03
A B
bar one 0 0 0 0 0 6 6 0 8 0 0 8
two 0 0 0 0 0 0 0 0 0 9 9 18
foo one 2 4 5 0 0 0 11 11 0 0 0 0
two 0 0 0 5 6 0 11 5 0 0 0 0
import pandas as pd
import numpy as np
df = pd.DataFrame({"A": ["foo", "foo", "foo", "foo", "foo",
"bar", "bar", "bar", "bar"],
"B": ["one", "one", "one", "two", "two",
"one", "one", "two", "two"],
"Year": [2019, 2019, 2019, 2019,
2019, 2019, 2020, 2020,
2020],
"Month": ["01", "02", "03", "04", "05", "06", "01", "02", "03"],
"Values": [2, 4, 5, 5, 6, 6, 8, 9, 9]})
print(df)
df_pivot = pd.pivot_table(df, values='Values', index=['A', 'B'],
columns=['Year', 'Month'], aggfunc=np.sum, fill_value=0)
print(df_pivot)
# create the same pivot, but just using the year total
df_year_pivot = pd.pivot_table(df, values='Values', index=['A', 'B'],
columns=['Year'], aggfunc=np.sum, fill_value=0)
print(df_year_pivot)
# since the dataframe you wish to add will have 2 index levels
# you need to add another level when you join the resulting data
# and since your new level will be a YTD, I just appended it to the year
multi_index_tuples = [(x, f'{x}YTD') for x in df_year_pivot.columns]
# now, you are going to add the new index level to the df with the level names the same as your first pivot
df_year_pivot.columns = pd.MultiIndex.from_tuples(multi_index_tuples, names=['Year', 'Month'])
# happily join on the same index
total_df = pd.merge(df_pivot, df_year_pivot, how='left', left_index=True, right_index=True)
print(total_df)
# sort the column index
total_df = total_df.sort_index(axis=1, level=[0,1])
print(total_df)
You can make use of the:您可以使用:
df.columns.get_level_values()
df.index.get_level_values()
syntax to slice multi-indexed rows and columns.对多索引行和列进行切片的语法。 I'd suggest changing your df's month column from string "01", to integer values, which makes it easier to slice using < > operators.
我建议将 df 的月份列从字符串“01”更改为 integer 值,这样可以更轻松地使用 < > 运算符进行切片。 If however, you need to stick with string valued month column names, then:
但是,如果您需要使用字符串值的月份列名,那么:
month_num = 4
df_pivot["2029YTD"] = df_pivot.loc[:, (df_pivot.columns.get_level_values(0) == 2019) &
(df_pivot.columns.get_level_values(1).astype(int) <= 4)].sum(axis=1)
df_pivot["2019FY"] = df_pivot.loc[:, df_pivot.columns.get_level_values(0) == 2019].sum(axis=1)
df_pivot["2020YTD"] = df_pivot.loc[:, df_pivot.columns.get_level_values(0) == 2020].sum(axis=1)
You'd end up with something like:你最终会得到类似的东西:
Year 2019 2020 2019YTD 2019FY 2020YTD
Month 01 02 03 04 05 06 01 02 03
A B
bar one 0 0 0 0 0 6 8 0 0 0 6 8
two 0 0 0 0 0 0 0 9 9 0 0 18
foo one 2 4 5 0 0 0 0 0 0 11 11 0
two 0 0 0 5 6 0 0 0 0 5 11 0
Once that's done, you can then adjust the column positions by using something like:完成后,您可以使用以下方法调整列位置:
df_pivot = df_pivot.loc[:, [2019, "2019FY", "2019YTD", 2020, "2020YTD"]]
To get something like:得到类似的东西:
Year 2019 2019FY 2019YTD 2020 2020YTD
Month 01 02 03 04 05 06 01 02 03
A B
bar one 0 0 0 0 0 6 6 0 8 0 0 8
two 0 0 0 0 0 0 0 0 0 9 9 18
foo one 2 4 5 0 0 0 11 11 0 0 0 0
two 0 0 0 5 6 0 11 5 0 0 0 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.