如何使用 pandas pivot_table 聚合唯一计数

Question

This code:这段代码：

df2 = (
    pd.DataFrame({
        'X' : ['X1', 'X1', 'X1', 'X1'], 
        'Y' : ['Y2', 'Y1', 'Y1', 'Y1'], 
        'Z' : ['Z3', 'Z1', 'Z1', 'Z2']
    })
)
g = df2.groupby('X')
pd.pivot_table(g, values='X', rows='Y', cols='Z', margins=False, aggfunc='count')

returns the following error:返回以下错误：

Traceback (most recent call last): ... 
AttributeError: 'Index' object has no attribute 'index'

How do I get a Pivot Table with counts of unique values of one DataFrame column for two other columns?如何获得一个 Pivot 表，其中一个 DataFrame 列的唯一值计数为另外两列？
Is there aggfunc for count unique?是否有用于计数唯一的aggfunc ？ Should I be using np.bincount() ?我应该使用np.bincount()吗？

NB.注意。 I am aware of pandas.Series.values_counts() however I need a pivot table.我知道pandas.Series.values_counts()但是我需要一个 pivot 表。

EDIT: The output should be:编辑： output 应该是：

Z   Z1  Z2  Z3
Y             
Y1   1   1 NaN
Y2 NaN NaN   1

Answer 1

Do you mean something like this?你的意思是这样的吗？

>>> df2.pivot_table(values='X', rows='Y', cols='Z', aggfunc=lambda x: len(x.unique()))

Z   Z1  Z2  Z3
Y             
Y1   1   1 NaN
Y2 NaN NaN   1

Note that using len assumes you don't have NA s in your DataFrame.请注意，使用len假设您的 DataFrame 中没有NA 。 You can do x.value_counts().count() or len(x.dropna().unique()) otherwise.否则，您可以执行x.value_counts().count()或len(x.dropna().unique()) 。

Answer 2

This is a good way of counting entries within .pivot_table :这是在.pivot_table中计算条目的好方法：

>>> df2.pivot_table(values='X', index=['Y','Z'], columns='X', aggfunc='count')

        X1  X2
Y   Z       
Y1  Z1   1   1
    Z2   1  NaN
Y2  Z3   1  NaN

Answer 3

Since at least version 0.16 of pandas, it does not take the parameter "rows"由于至少版本 0.16 的熊猫，它不带参数“行”

As of 0.23, the solution would be:从 0.23 开始，解决方案是：

df2.pivot_table(values='X', index='Y', columns='Z', aggfunc=pd.Series.nunique)

which returns:返回：

Z    Z1   Z2   Z3
Y                
Y1  1.0  1.0  NaN
Y2  NaN  NaN  1.0

Answer 4

aggfunc=pd.Series.nunique provides distinct count. aggfunc=pd.Series.nunique提供不同的计数。 Full code is following:完整代码如下：

df2.pivot_table(values='X', rows='Y', cols='Z', aggfunc=pd.Series.nunique)

Credit to @hume for this solution (see comment under the accepted answer).此解决方案归功于@hume（请参阅已接受答案下的评论）。 Adding as an answer here for better discoverability.在此处添加答案以提高可发现性。

Answer 5

The aggfunc parameter in pandas.DataFrame.pivot_table will take 'nunique' as a string , or in a list pandas.DataFrame.pivot_table中的aggfunc参数将'nunique'作为string或list
- pandas.Series.nunique or pandas.core.groupby.DataFrameGroupBy.nunique pandas.Series.nunique或pandas.core.groupby.DataFrameGroupBy.nunique
Tested in pandas 1.3.1在pandas 1.3.1中测试

out = df2.pivot_table(values='X', index='Y', columns='Z', aggfunc=['nunique', 'count', lambda x: len(x.unique()), len])

[out]:
             nunique           count           <lambda>            len          
Z       Z1   Z2   Z3    Z1   Z2   Z3       Z1   Z2   Z3   Z1   Z2   Z3
Y                                                                     
Y1     1.0  1.0  NaN   2.0  1.0  NaN      1.0  1.0  NaN  2.0  1.0  NaN
Y2     NaN  NaN  1.0   NaN  NaN  1.0      NaN  NaN  1.0  NaN  NaN  1.0


out = df2.pivot_table(values='X', index='Y', columns='Z', aggfunc='nunique')

[out]:
Z    Z1   Z2   Z3
Y                
Y1  1.0  1.0  NaN
Y2  NaN  NaN  1.0

out = df2.pivot_table(values='X', index='Y', columns='Z', aggfunc=['nunique'])

[out]:
             nunique          
Z       Z1   Z2   Z3
Y                   
Y1     1.0  1.0  NaN
Y2     NaN  NaN  1.0

Answer 6

You can construct a pivot table for each distinct value of X .您可以为X每个不同值构建一个数据透视表。 In this case,在这种情况下，

for xval, xgroup in g:
    ptable = pd.pivot_table(xgroup, rows='Y', cols='Z', 
        margins=False, aggfunc=numpy.size)

will construct a pivot table for each value of X .将为X每个值构建一个数据透视表。 You may want to index ptable using the xvalue .您可能希望使用xvalue索引ptable 。 With this code, I get (for X1 )使用此代码，我得到（对于X1 ）

     X        
Z   Z1  Z2  Z3
Y             
Y1   2   1 NaN
Y2 NaN NaN   1

Answer 7

Since none of the answers are up to date with the last version of Pandas, I am writing another solution for this problem:由于最新版本的 Pandas 没有一个答案是最新的，我正在为这个问题编写另一个解决方案：

import pandas as pd

# Set example
df2 = (
    pd.DataFrame({
        'X' : ['X1', 'X1', 'X1', 'X1'], 
        'Y' : ['Y2', 'Y1', 'Y1', 'Y1'], 
        'Z' : ['Z3', 'Z1', 'Z1', 'Z2']
    })
)

# Pivot
pd.crosstab(index=df2['Y'], columns=df2['Z'], values=df2['X'], aggfunc=pd.Series.nunique)

which returns:返回：

Z   Z1  Z2  Z3
Y           
Y1  1.0 1.0 NaN
Y2  NaN NaN 1.0

Answer 8

For best performance I recommend doing DataFrame.drop_duplicates followed up aggfunc='count' .为了获得最佳性能，我建议在DataFrame.drop_duplicates之后执行aggfunc='count' 。

Others are correct that aggfunc=pd.Series.nunique will work.其他人是正确的aggfunc=pd.Series.nunique将工作。 This can be slow, however, if the number of index groups you have is large (>1000).但是，如果您拥有的index组数量很大 (>1000)，这可能会很慢。

So instead of (to quote @Javier)所以而不是（引用@Javier）

df2.pivot_table('X', 'Y', 'Z', aggfunc=pd.Series.nunique)

I suggest我建议

df2.drop_duplicates(['X', 'Y', 'Z']).pivot_table('X', 'Y', 'Z', aggfunc='count')

This works because it guarantees that every subgroup (each combination of ('Y', 'Z') ) will have unique (non-duplicate) values of 'X' .这是有效的，因为它保证每个子组（ ('Y', 'Z')每个组合）将具有'X'唯一（非重复）值。

Answer 9

aggfunc=pd.Series.nunique will only count unique values for a series - in this case count the unique values for a column. aggfunc=pd.Series.nunique将只计算一个系列的唯一值 - 在这种情况下计算列的唯一值。 But this doesn't quite reflect as an alternative to aggfunc='count'但这并不能完全反映作为aggfunc='count'的替代方案

For simple counting, it better to use aggfunc=pd.Series.count对于简单的计数，最好使用aggfunc=pd.Series.count

如何使用 pandas pivot_table 聚合唯一计数

问题描述

9 个解决方案

解决方案1
104 已采纳 2012-10-12 15:19:00

解决方案2
46 2013-10-28 08:48:01

解决方案3
33 2018-07-16 17:45:37

解决方案4
7 2018-07-06 03:06:31

解决方案5
4 2021-08-19 20:36:38

解决方案6
1 2012-10-12 15:21:39

解决方案7
0 2019-08-08 18:33:56

解决方案8
0 2019-12-26 21:49:50

解决方案9
0 2020-12-02 10:35:55

如何使用 pandas pivot_table 聚合唯一计数

问题描述

9 个解决方案

解决方案1 104 已采纳 2012-10-12 15:19:00

解决方案2 46 2013-10-28 08:48:01

解决方案3 33 2018-07-16 17:45:37

解决方案4 7 2018-07-06 03:06:31

解决方案5 4 2021-08-19 20:36:38

解决方案6 1 2012-10-12 15:21:39

解决方案7 0 2019-08-08 18:33:56

解决方案8 0 2019-12-26 21:49:50

解决方案9 0 2020-12-02 10:35:55

解决方案1
104 已采纳 2012-10-12 15:19:00

解决方案2
46 2013-10-28 08:48:01

解决方案3
33 2018-07-16 17:45:37

解决方案4
7 2018-07-06 03:06:31

解决方案5
4 2021-08-19 20:36:38

解决方案6
1 2012-10-12 15:21:39

解决方案7
0 2019-08-08 18:33:56

解决方案8
0 2019-12-26 21:49:50

解决方案9
0 2020-12-02 10:35:55