简体   繁体   English

Pandas pivot 表中是否有function 添加多列的差异?

[英]Is there a function in Pandas pivot table to add the difference of multiple columns?

I have the following pandas DataFrame:我有以下 pandas DataFrame:

df = pd.DataFrame({"A": ["foo", "foo", "foo", "foo", "foo",
                         "bar", "bar", "bar", "bar",'foo' ],
                   "B": ["one", "one", "one", "two", "two",
                         "one", "one", "two", "two", 'two'],
                   "C": ["small", "large", "large", "small",
                         "small", "large", "small", "small",
                         "large", 'large'],
                   "D": [1, 2, 2, 3, 3, 4, 5, 6, 7,8],
               })

with the following output:使用以下 output:

print(df)

    A   B   C       D
0   foo one small   1
1   foo one large   2
2   foo one large   2
3   foo two small   3
4   foo two small   3
5   bar one large   4
6   bar one small   5
7   bar two small   6
8   bar two large   7
9   foo two large   8

then I am doing a pivot table as follows:然后我正在做一个 pivot 表,如下所示:

table = pd.pivot_table(df, values='D', index=['A'],
                    columns=['B','C'])

With the following output:使用以下 output:

print(table)

B   one             two
C   large   small   large   small
A               
bar   4      5       7        6
foo   2      1       8        3

How could I add the difference between large and small ( large - small ) for one and two ( diff in table below)?我如何为onetwo添加largesmalllarge - small )之间的diff (下表中的差异)? The expected output would be:预期的 output 将是:

B   one                 two
C   large   small diff  large   small difff
A               
bar   4        5   -1     7       6    1
foo   2        1    1     8       3    5

I saw some previous answers but only treated 1 column.我看到了一些以前的答案,但只处理了 1 列。 Also, ideally would be done using the aggfunc此外,理想情况下将使用aggfunc完成

Additionally, how would be the way to re-transform the table into the initial format?另外,如何将表格重新转换为初始格式? Expected output would be:预计 output 将是:

  A   B   C     D 
0  foo one small 1 
1  foo one large 2 
2  foo one large 2 
3  foo two small 3 
4  foo two small 3 
5  bar one large 4 
6  bar one small 5 
7  bar two small 6 
8  bar two large 7 
9  foo two large 8 
10 bar one diff -1 
11 bar two diff 1 
12 foo one diff 1 
13 foo two diff 5

Thanks in advance for help!在此先感谢您的帮助!

diffs = (table.groupby(level="B", axis="columns")
              .diff(-1).dropna(axis="columns")
              .rename(columns={"large": "diff"}, level="C"))

new = table.join(diffs).loc[:, table.columns.get_level_values("B").unique()]
  • groupby the level "B" of columns ("one", "two"...)按列的“B”级别分组(“一”、“二”...)
  • take difference from left to right (diff(-1))从左到右取差 (diff(-1))
    • ie, compute "large - small" values即,计算“大 - 小”值
  • since there's nothing next to small further, it will be all NaNs, drop it因为 small 旁边没有任何东西,所以都是 NaN,放弃它
  • rename the "large"s which actually now hold the differences重命名实际上现在存在差异的“大”
  • join with the pivoted table and restore the "one", "two" original order加入数据透视表并恢复“一”、“二”的原始顺序

to get要得到

>>> new

B     one              two
C   large small diff large small diff
A
bar     4     5   -1     7     6    1
foo     2     1    1     8     3    5

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM