groupby 和 pivot_table 的区别

Question

I just started learning Pandas and was wondering if there is any difference between groupby and pivot_table functions.我刚开始学习 Pandas 并且想知道groupby和pivot_table函数之间是否有任何区别。 Can anyone help me understand the difference between them?谁能帮我理解它们之间的区别？

Answer 1

Both pivot_table and groupby are used to aggregate your dataframe. pivot_table和groupby都用于聚合您的数据框。 The difference is only with regard to the shape of the result.区别仅在于结果的形状。

Using pd.pivot_table(df, index=["a"], columns=["b"], values=["c"], aggfunc=np.sum) a table is created where a is on the row axis, b is on the column axis, and the values are the sum of c .使用pd.pivot_table(df, index=["a"], columns=["b"], values=["c"], aggfunc=np.sum)创建一个表，其中a在行轴上， b位于列轴上，值是c的总和。

Example:例子：

df = pd.DataFrame({"a": [1,2,3,1,2,3], "b":[1,1,1,2,2,2], "c":np.random.rand(6)})
pd.pivot_table(df, index=["a"], columns=["b"], values=["c"], aggfunc=np.sum)

b         1         2
a                    
1  0.528470  0.484766
2  0.187277  0.144326
3  0.866832  0.650100

Using groupby , the dimensions given are placed into columns, and rows are created for each combination of those dimensions.使用groupby ，给定的维度被放置到列中，并为这些维度的每个组合创建行。

In this example, we create a series of the sum of values c , grouped by all unique combinations of a and b .在此示例中，我们创建了一系列值c的总和，按a和b的所有唯一组合分组。

df.groupby(['a','b'])['c'].sum()

a  b
1  1    0.528470
   2    0.484766
2  1    0.187277
   2    0.144326
3  1    0.866832
   2    0.650100
Name: c, dtype: float64

A similar usage of groupby is if we omit the ['c'] . groupby的类似用法是如果我们省略['c'] 。 In this case, it creates a dataframe (not a series) of the sums of all remaining columns grouped by unique values of a and b .在这种情况下，它会创建一个数据框（不是一系列），其中包含按a和b的唯一值分组的所有剩余列的总和。

print df.groupby(["a","b"]).sum()
            c
a b          
1 1  0.528470
  2  0.484766
2 1  0.187277
  2  0.144326
3 1  0.866832
  2  0.650100

Answer 2

It's more appropriate to use .pivot_table() instead of .groupby() when you need to show aggregates with both rows and column labels.当您需要显示具有行和列标签的聚合时，使用.pivot_table()而不是.groupby() ) 更合适。

.pivot_table() makes it easy to create row and column labels at the same time and is preferable, even though you can get similar results using .groupby() with few extra steps. .pivot_table()可以轻松地同时创建行和列标签，并且更可取，即使您可以使用.groupby()获得类似的结果，只需几个额外的步骤。

Answer 3

pivot_table = groupby + unstack and groupby = pivot_table + stack hold True. pivot_table = groupby + unstack和groupby = pivot_table + stack保持真。

In particular, if columns parameter of pivot_table() is not used, then groupby() and pivot_table() both produce the same result (if the same aggregator function is used).特别是，如果未使用pivot_table()的columns参数，则groupby()和pivot_table()都会产生相同的结果（如果使用相同的聚合器函数）。

# sample
df = pd.DataFrame({"a": [1,1,1,2,2,2], "b": [1,1,2,2,3,3], "c": [0,0.5,1,1,2,2]})

# example
gb = df.groupby(['a','b'])[['c']].sum()
pt = df.pivot_table(index=['a','b'], values=['c'], aggfunc='sum')

# equality test
gb.equals(pt) #True

In general, if we check the source code , pivot_table() internally calls __internal_pivot_table() .一般来说，如果我们检查源代码， pivot_table()在内部调用__internal_pivot_table() 。 This function creates a single flat list out of index and columns and calls groupby() with this list as the grouper.此函数从索引和列中创建一个平面列表，并使用此列表作为分组器调用groupby() 。 Then after aggregation, calls unstack() on the list of columns.然后在聚合之后，在列列表上调用unstack() 。

If columns are never passed, there is nothing to unstack on, so groupby and pivot_table trivially produce the same output.如果从不传递列，则没有什么可取消堆叠的，因此groupby和pivot_table会生成相同的输出。

A demonstration of this function is:此功能的演示是：

gb = (
    df
    .groupby(['a','b'])[['c']].sum()
    .unstack(['b'])
)
pt = df.pivot_table(index=['a'], columns=['b'], values=['c'], aggfunc='sum')

gb.equals(pt) # True

As stack() is the inverse operation of unstack() , the following holds True as well:由于stack()是unstack()的逆运算，因此以下也成立：

(
    df
    .pivot_table(index=['a'], columns=['b'], values=['c'], aggfunc='sum')
    .stack(['b'])
    .equals(
        df.groupby(['a','b'])[['c']].sum()
    )
) # True

In conclusion, depending on the use case, one is more convenient than the other but they can both be used instead of the other and after correctly applying stack() / unstack() , both will result in the same output.总之，根据用例，一个比另一个更方便，但它们都可以代替另一个使用，并且在正确应用stack() / unstack()之后，两者都将产生相同的输出。

Answer 4

Difference between pivot_table and groupby pivot_table和groupby之间的区别

groupby 和 pivot_table 的区别

问题描述

4 个解决方案

解决方案1
120 已采纳 2016-01-10 06:45:18

解决方案2
12 2019-06-19 22:24:32

解决方案3
2 2022-07-11 02:02:37

解决方案4
1 2022-11-30 08:58:30

pivot_table 数据透视表

groupby 通过...分组

groupby 和 pivot_table 的区别

问题描述

4 个解决方案

解决方案1 120 已采纳 2016-01-10 06:45:18

解决方案2 12 2019-06-19 22:24:32

解决方案3 2 2022-07-11 02:02:37

解决方案4 1 2022-11-30 08:58:30

pivot_table 数据透视表

groupby 通过...分组

解决方案1
120 已采纳 2016-01-10 06:45:18

解决方案2
12 2019-06-19 22:24:32

解决方案3
2 2022-07-11 02:02:37

解决方案4
1 2022-11-30 08:58:30