`pd.pivot_table`和`pd.DataFrame.groupby` +`pd.DataFrame.unstack`之间是否存在完全重叠？

Question

(Please note that there's a question Pandas: group by and Pivot table difference , but this question is different.) （请注意， Pandas有一个问题：group by和Pivot表差异，但这个问题不同。）

Suppose you start with a DataFrame 假设您从DataFrame开始

df = pd.DataFrame({'a': ['x'] * 2 + ['y'] * 2, 'b': [0, 1, 0, 1], 'val': range(4)})
>>> df
Out[18]: 
   a  b  val
0  x  0    0
1  x  1    1
2  y  0    2
3  y  1    3

Now suppose you want to make the index a , the columns b , the values in a cell val , and specify what to do if there are two or more values in a resulting cell: 现在假设您要创建索引a ，列b ，单元格val的值，并指定在结果单元格中有两个或更多值时要执行的操作：

Then you can do this either through 然后你可以通过

df.val.groupby([df.a, df.b]).sum().unstack()

or through 或通过

pd.pivot_table(df, index='a', columns='b', values='val', aggfunc='sum')

So it seems to me that there's a simple correspondence between correspondence between the two (given one, you could almost write a script to transform it into the other). 所以在我看来，两者之间的对应关系之间有一个简单的对应关系（给定一个，你几乎可以写一个脚本来将它转换成另一个）。 I also thought of more complex cases with hierarchical indices / columns, but I still see no difference. 我还想到了更复杂的层次索引/列的情况，但我仍然认为没有区别。

Is there something I've missed? 有没有我错过的东西？

Are there operations that can be performed using one and not the other? 是否可以使用one而不是其他操作执行操作？
Are there, perhaps, operations easier to perform using one over the other? 也许，操作更容易使用一个而不是另一个？
If not, why not deprecate pivot_tale ? 如果没有，为什么不弃用pivot_tale ？ groupby seems much more general. groupby似乎更普遍。

Answer 1

If I understood the source code for pivot_table(index, columns, values, aggfunc) correctly it's tuned up equivalent for: 如果我正确理解了pivot_table(index, columns, values, aggfunc)的源代码pivot_table(index, columns, values, aggfunc) ，它的调整等效于：

df.groupby([index + columns]).agg(aggfunc).unstack(columns)

plus: 加：

margins (subtotals and grand totals as @ayhan has already said ) 保证金（ @ayhan已经说过的小计和总计）
pivot_table() also removes extra multi-levels from columns axis (see example below) pivot_table()还从列轴移除额外的多级别（参见下面的示例）
convenient dropna parameter: Do not include columns whose entries are all NaN 方便的dropna参数：不包括条目全部为NaN的列

Demo: (I took this DF from the docstring [source code for pivot_table() ]) 演示:(我从docstring中获取了这个DF [ pivot_table()源代码]）

In [40]: df
Out[40]:
     A    B      C  D
0  foo  one  small  1
1  foo  one  large  2
2  foo  one  large  2
3  foo  two  small  3
4  foo  two  small  3
5  bar  one  large  4
6  bar  one  small  5
7  bar  two  small  6
8  bar  two  large  7

In [41]: df.pivot_table(index=['A','B'], columns='C', values='D', aggfunc=[np.sum,np.mean])
Out[41]:
          sum        mean
C       large small large small
A   B
bar one   4.0   5.0   4.0   5.0
    two   7.0   6.0   7.0   6.0
foo one   4.0   1.0   2.0   1.0
    two   NaN   6.0   NaN   3.0

pay attention at the top level column: D 注意顶级栏目： D

In [42]: df.groupby(['A','B','C']).agg([np.sum, np.mean]).unstack('C')
Out[42]:
            D
          sum        mean
C       large small large small
A   B
bar one   4.0   5.0   4.0   5.0
    two   7.0   6.0   7.0   6.0
foo one   4.0   1.0   2.0   1.0
    two   NaN   6.0   NaN   3.0

why not deprecate pivot_tale? 为什么不弃用pivot_tale？ groupby seems much more general. groupby似乎更普遍。

IMO, because it's very easy to use and very convenient! IMO，因为它非常易于使用且非常方便！ ;) ;）

`pd.pivot_table`和`pd.DataFrame.groupby` +`pd.DataFrame.unstack`之间是否存在完全重叠？

问题描述

1 个解决方案

解决方案1
5 已采纳 2016-09-25 06:02:49

`pd.pivot_table`和`pd.DataFrame.groupby` +`pd.DataFrame.unstack`之间是否存在完全重叠？

问题描述

1 个解决方案

解决方案1 5 已采纳 2016-09-25 06:02:49

解决方案1
5 已采纳 2016-09-25 06:02:49