简体   繁体   English

Pandas 在 pivot 表中添加列的总和(多索引)

[英]Pandas adding sum of columns in pivot table (multiindexed)

I have df and df_pivot with below code: import pandas as pd import numpy as np我有带有以下代码的 df 和 df_pivot: import pandas as pd import numpy as np

df = pd.DataFrame({"A": ["foo", "foo", "foo", "foo", "foo",
                         "bar", "bar", "bar", "bar"],
                  "B": ["one", "one", "one", "two", "two",
                         "one", "one", "two", "two"],
                  "Year": [2019, 2019, 2019, 2019,
                         2019, 2019, 2020, 2020,
                          2020],
                  "Month": ["01", "02", "03", "04", "05", "06", "01", "02", "03"],
                  "Values": [2, 4, 5, 5, 6, 6, 8, 9, 9]})


df_pivot = pd.pivot_table(df, values='Values', index=['A', 'B'],
                    columns=['Year','Month'], aggfunc=np.sum, fill_value=0)

df_pivot looks like below: df_pivot 如下所示:

Year    2019                2020      
Month     01 02 03 04 05 06   01 02 03
A   B                                 
bar one    0  0  0  0  0  6    8  0  0
    two    0  0  0  0  0  0    0  9  9
foo one    2  4  5  0  0  0    0  0  0
    two    0  0  0  5  6  0    0  0  0

now what I am trying to do is to add basically three columns into this df: 2019FY, 2019YTD, 2020YTD现在我要做的是在这个df中基本上添加三列:2019FY, 2019YTD, 2020YTD

2019FY column should be sum of all values under "2019" 2019FY 列应该是“2019”下所有值的总和

2019YTD column should be sum of all values under "2019" where period is defined, ie if period is defined as 04, 2019YTD should sum columns under 2019 for 01/02/03/04 2019YTD 列应该是定义了期间的“2019”下所有值的总和,即如果期间定义为 04,则 2019YTD 应该对 2019 年 01/02/03/04 下的列求和

2020YTD column should be sum of all values under "2020", 2020YTD 列应该是“2020”下所有值的总和,

Output table should look as below: Output 表应如下所示:

Year    2019               2019FY 2019YTD 2020      2020YTD
Month     01 02 03 04 05 06                01 02 03
A   B                                 
bar one    0  0  0  0  0  6  6      0      8  0  0      8
    two    0  0  0  0  0  0  0      0      0  9  9      18
foo one    2  4  5  0  0  0 11      11     0  0  0      0
    two    0  0  0  5  6  0 11      5      0  0  0      0

Essentially I would like to know how I can sum column with given "Month", as from here I can create 2019FY/2019YTD/2020YTD on my own, also it is important to add them in specific slot in the pivot table (at the end of 2019 data and at the end of 2020 data).本质上,我想知道如何将列与给定的“月份”相加,因为从这里我可以自己创建 2019FY/2019YTD/2020YTD,此外,将它们添加到 pivot 表的特定插槽中也很重要(最后2019 年数据和 2020 年底数据)。

Is it feasible?可行吗?

I was looking everywhere but could not find example how to do it.我到处找,但找不到如何做的例子。

Appreciate the help感谢帮助

Thanks Pawel谢谢帕维尔

For each year is possible create new columns in custom function, so in ouput is also 2020FY columns in GroupBy.apply :每年都可以在自定义 function 中创建新列,因此输出中的GroupBy.apply也是2020FY列:

def f(x):
    #get all months and convert to integers numbers
    c = x.columns.get_level_values(1).astype(int)
    #sum all values
    s1 = x.sum(axis=1)
    #sum 1,2,3,4 months
    s2 = x.loc[:, c <= 4].sum(axis=1)
    x[(f'{x.name}FY','')] = s1
    x[(f'{x.name}YTD','')] = s2

    return x

df = df_pivot.groupby(level=0, axis=1, group_keys=False).apply(f)
print (df)
Year    2019                2019FY 2019YTD 2020       2020FY 2020YTD
Month     01 02 03 04 05 06                  01 02 03               
A   B                                                               
bar one    0  0  0  0  0  6      6       0    8  0  0      8       8
    two    0  0  0  0  0  0      0       0    0  9  9     18      18
foo one    2  4  5  0  0  0     11      11    0  0  0      0       0
    two    0  0  0  5  6  0     11       5    0  0  0      0       0

If need remove columns use tuple s, because MultiIndex :如果需要删除列,请使用tuple ,因为MultiIndex

df = df.drop([('2020FY','')], axis=1)
print (df)
Year    2019                2019FY 2019YTD 2020       2020YTD
Month     01 02 03 04 05 06                  01 02 03        
A   B                                                        
bar one    0  0  0  0  0  6      6       0    8  0  0       8
    two    0  0  0  0  0  0      0       0    0  9  9      18
foo one    2  4  5  0  0  0     11      11    0  0  0       0
    two    0  0  0  5  6  0     11       5    0  0  0       0
    
    
import pandas as pd
import numpy as np

df = pd.DataFrame({"A": ["foo", "foo", "foo", "foo", "foo",
                         "bar", "bar", "bar", "bar"],
                   "B": ["one", "one", "one", "two", "two",
                         "one", "one", "two", "two"],
                   "Year": [2019, 2019, 2019, 2019,
                            2019, 2019, 2020, 2020,
                            2020],
                   "Month": ["01", "02", "03", "04", "05", "06", "01", "02", "03"],
                   "Values": [2, 4, 5, 5, 6, 6, 8, 9, 9]})

print(df)

df_pivot = pd.pivot_table(df, values='Values', index=['A', 'B'],
                          columns=['Year', 'Month'], aggfunc=np.sum, fill_value=0)
print(df_pivot)

# create the same pivot, but just using the year total
df_year_pivot = pd.pivot_table(df, values='Values', index=['A', 'B'],
                               columns=['Year'], aggfunc=np.sum, fill_value=0)
print(df_year_pivot)
# since the dataframe you wish to add will have 2 index levels
# you need to add another level when you join the resulting data
# and since your new level will be a YTD, I just appended it to the year
multi_index_tuples = [(x, f'{x}YTD') for x in df_year_pivot.columns]

# now, you are going to add the new index level to the df with the level names the same as your first pivot
df_year_pivot.columns = pd.MultiIndex.from_tuples(multi_index_tuples, names=['Year', 'Month'])

# happily join on the same index
total_df = pd.merge(df_pivot, df_year_pivot, how='left', left_index=True, right_index=True)
print(total_df)

# sort the column index
total_df = total_df.sort_index(axis=1, level=[0,1])
print(total_df)

You can make use of the:您可以使用:

df.columns.get_level_values()
df.index.get_level_values()

syntax to slice multi-indexed rows and columns.对多索引行和列进行切片的语法。 I'd suggest changing your df's month column from string "01", to integer values, which makes it easier to slice using < > operators.我建议将 df 的月份列从字符串“01”更改为 integer 值,这样可以更轻松地使用 < > 运算符进行切片。 If however, you need to stick with string valued month column names, then:但是,如果您需要使用字符串值的月份列名,那么:

month_num = 4
df_pivot["2029YTD"] = df_pivot.loc[:, (df_pivot.columns.get_level_values(0) == 2019) & 
                                   (df_pivot.columns.get_level_values(1).astype(int) <= 4)].sum(axis=1)
df_pivot["2019FY"] = df_pivot.loc[:, df_pivot.columns.get_level_values(0) == 2019].sum(axis=1)
df_pivot["2020YTD"] = df_pivot.loc[:, df_pivot.columns.get_level_values(0) == 2020].sum(axis=1)

You'd end up with something like:你最终会得到类似的东西:

    Year    2019    2020    2019YTD          2019FY 2020YTD
Month   01  02  03  04  05  06  01  02  03          
A   B                                               
bar one 0   0   0   0   0   6   8   0   0   0   6   8
    two 0   0   0   0   0   0   0   9   9   0   0   18
foo one 2   4   5   0   0   0   0   0   0   11  11  0
    two 0   0   0   5   6   0   0   0   0   5   11  0

Once that's done, you can then adjust the column positions by using something like:完成后,您可以使用以下方法调整列位置:

df_pivot = df_pivot.loc[:, [2019, "2019FY", "2019YTD", 2020, "2020YTD"]]

To get something like:得到类似的东西:

    Year    2019         2019FY  2019YTD 2020      2020YTD
Month   01  02  03  04  05  06          01  02  03  
A   B                                               
bar one 0   0   0   0   0   6   6   0   8   0   0   8
    two 0   0   0   0   0   0   0   0   0   9   9   18
foo one 2   4   5   0   0   0   11  11  0   0   0   0
    two 0   0   0   5   6   0   11  5   0   0   0   0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM