Pandas pivot 表中是否有function 添加多列的差异？

Question

I have the following pandas DataFrame:我有以下 pandas DataFrame：

df = pd.DataFrame({"A": ["foo", "foo", "foo", "foo", "foo",
                         "bar", "bar", "bar", "bar",'foo' ],
                   "B": ["one", "one", "one", "two", "two",
                         "one", "one", "two", "two", 'two'],
                   "C": ["small", "large", "large", "small",
                         "small", "large", "small", "small",
                         "large", 'large'],
                   "D": [1, 2, 2, 3, 3, 4, 5, 6, 7,8],
               })

with the following output:使用以下 output：

print(df)

    A   B   C       D
0   foo one small   1
1   foo one large   2
2   foo one large   2
3   foo two small   3
4   foo two small   3
5   bar one large   4
6   bar one small   5
7   bar two small   6
8   bar two large   7
9   foo two large   8

then I am doing a pivot table as follows:然后我正在做一个 pivot 表，如下所示：

table = pd.pivot_table(df, values='D', index=['A'],
                    columns=['B','C'])

With the following output:使用以下 output：

print(table)

B   one             two
C   large   small   large   small
A               
bar   4      5       7        6
foo   2      1       8        3

How could I add the difference between large and small ( large - small ) for one and two ( diff in table below)?我如何为one和two添加large和small （ large - small ）之间的diff （下表中的差异）？ The expected output would be:预期的 output 将是：

B   one                 two
C   large   small diff  large   small difff
A               
bar   4        5   -1     7       6    1
foo   2        1    1     8       3    5

I saw some previous answers but only treated 1 column.我看到了一些以前的答案，但只处理了 1 列。 Also, ideally would be done using the aggfunc此外，理想情况下将使用aggfunc完成

Additionally, how would be the way to re-transform the table into the initial format?另外，如何将表格重新转换为初始格式？ Expected output would be:预计 output 将是：

  A   B   C     D 
0  foo one small 1 
1  foo one large 2 
2  foo one large 2 
3  foo two small 3 
4  foo two small 3 
5  bar one large 4 
6  bar one small 5 
7  bar two small 6 
8  bar two large 7 
9  foo two large 8 
10 bar one diff -1 
11 bar two diff 1 
12 foo one diff 1 
13 foo two diff 5

Thanks in advance for help!在此先感谢您的帮助！

Answer 1

diffs = (table.groupby(level="B", axis="columns")
              .diff(-1).dropna(axis="columns")
              .rename(columns={"large": "diff"}, level="C"))

new = table.join(diffs).loc[:, table.columns.get_level_values("B").unique()]

groupby the level "B" of columns ("one", "two"...)按列的“B”级别分组（“一”、“二”...）
take difference from left to right (diff(-1))从左到右取差 (diff(-1))
- ie, compute "large - small" values即，计算“大 - 小”值
since there's nothing next to small further, it will be all NaNs, drop it因为 small 旁边没有任何东西，所以都是 NaN，放弃它
rename the "large"s which actually now hold the differences重命名实际上现在存在差异的“大”
join with the pivoted table and restore the "one", "two" original order加入数据透视表并恢复“一”、“二”的原始顺序

to get要得到

>>> new

B     one              two
C   large small diff large small diff
A
bar     4     5   -1     7     6    1
foo     2     1    1     8     3    5

Pandas pivot 表中是否有function 添加多列的差异？

问题描述

1 个解决方案

解决方案1
2 已采纳 2023-01-09 17:50:56

Pandas pivot 表中是否有function 添加多列的差异？

问题描述

1 个解决方案

解决方案1 2 已采纳 2023-01-09 17:50:56

解决方案1
2 已采纳 2023-01-09 17:50:56