简体   繁体   English

按组每列的唯一值数

[英]Number of unique values per column by group

Consider the following dataframe:考虑以下数据框:

      A      B  E
0   bar    one  1
1   bar  three  1
2  flux    six  1
3  flux  three  2
4   foo   five  2
5   foo    one  1
6   foo    two  1
7   foo    two  2

I would like to find, for each value of A , the number of unique values in the other columns.我想为A每个值找到其他列中唯一值的数量。

  1. I thought the following would do it:我认为以下会做到这一点:

     df.groupby('A').apply(lambda x: x.nunique())

    but I get an error:但我收到一个错误:

     AttributeError: 'DataFrame' object has no attribute 'nunique'
  2. I also tried with:我也试过:

     df.groupby('A').nunique()

    but I also got the error:但我也得到了错误:

     AttributeError: 'DataFrameGroupBy' object has no attribute 'nunique'
  3. Finally I tried with:最后我尝试了:

     df.groupby('A').apply(lambda x: x.apply(lambda y: y.nunique()))

    which returns:返回:

     ABE A bar 1 2 1 flux 1 2 2 foo 1 3 2

    and seems to be correct.并且似乎是正确的。 Strangely though, it also returns the column A in the result.奇怪的是,它也在结果中返回A列。 Why?为什么?

The DataFrame object doesn't have nunique , only Series do. DataFrame对象没有nunique ,只有Series有。 You have to pick out which column you want to apply nunique() on.您必须选择要对其应用nunique()列。 You can do this with a simple dot operator:你可以用一个简单的点运算符来做到这一点:

df.groupby('A').apply(lambda x: x.B.nunique())

will print:将打印:

A
bar     2
flux    2
foo     3

And doing:并做:

df.groupby('A').apply(lambda x: x.E.nunique())

will print:将打印:

A
bar     1
flux    2
foo     2

Alternatively you can do this with one function call using:或者,您可以使用一个函数调用来执行此操作:

df.groupby('A').aggregate({'B': lambda x: x.nunique(), 'E': lambda x: x.nunique()})

which will print:这将打印:

      B  E
A
bar   2  1
flux  2  2
foo   3  2

To answer your question about why your recursive lambda prints the A column as well, it's because when you do a groupby / apply operation, you're now iterating through three DataFrame objects.要回答关于为什么递归 lambda 也打印A列的问题,这是因为当您执行groupby / apply操作时,您现在正在遍历三个DataFrame对象。 Each DataFrame object is a sub- DataFrame of the original.每个DataFrame对象都是原始DataFrame的一个子DataFrame Applying an operation to that will apply it to each Series .对其应用操作会将其应用到每个Series There are three Series per DataFrame you're applying the nunique() operator to.您应用nunique()运算符的每个DataFrame有三个Series

The first Series being evaluated on each DataFrame is the A Series , and since you've done a groupby on A , you know that in each DataFrame , there is only one unique value in the A Series .在每个DataFrame上评估的第一个SeriesA Series ,并且由于您已经对A进行了groupby ,您知道在每个DataFrameA Series只有一个唯一值。 This explains why you're ultimately given an A result column with all 1 's.这解释了为什么您最终会得到一个全为1A结果列。

I encountered the same problem.我遇到了同样的问题。 Upgrading pandas to the latest version solved the problem for me.将熊猫升级到最新版本为我解决了这个问题。

df.groupby('A').nunique()

The above code did not work for me in Pandas version 0.19.2.上面的代码在 Pandas 0.19.2 版中对我不起作用。 I upgraded it to Pandas version 0.21.1 and it worked.我将它升级到 Pandas 版本 0.21.1 并且它起作用了。

You can check the version using the following code:您可以使用以下代码检查版本:

print('Pandas version ' + pd.__version__)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM