简体   繁体   English

在 pandas dataframe 中使用.groupby 计算唯一值

[英]counting unique values using .groupby in pandas dataframe

I have a dataframe and I when I run my code it returns all Nan's instead of returning the counted value.我有一个 dataframe,当我运行我的代码时,它返回所有 Nan 而不是返回计数值。 I'm sure it's something simple but I can't figure it out.我确定这很简单,但我无法弄清楚。 I'm trying to get a unique number of species in each location.我试图在每个位置获得唯一数量的物种。 I'd like the new column to output a count of species [2,2,1,1,2,2,1,1]我想在 output 的新专栏中列出物种 [2,2,1,1,2,2,1,1]

import pandas as pd

df = pd.DataFrame({
         'ID': [1, 2, 3, 4, 5, 6, 7, 8],
         'location': ['A', 'A', 'C', 'C', 'E', 'E', 'E', 'E'],
         'Species': ['Cat', 'Cat', 'Dog', 'Cat', 'Cat', 'Cat', 'Dog', 'Bird'],
         'Count': [2,2,2,2,4,4,4,4]
})

def abundance(data):
    data["Abundance"] = data[data.Species.notnull()].groupby('location')['Species'].unique()

abundance(df)
print(df)
````````````````````
   ID location Species  Count Abundance
0   1        A     Cat      2       NaN
1   2        A     Cat      2       NaN
2   3        C     Dog      2       NaN
3   4        C     Cat      2       NaN
4   5        E     Cat      4       NaN
5   6        E     Cat      4       NaN
6   7        E     Dog      4       NaN
7   8        E    Bird      4       NaN

I believe you want count of each pair location, Species .我相信您想要计数每对location, Species And also, to assign groupby output back to the original dataframe, we usually use transform :而且,为了将groupby output 分配回原来的 dataframe,我们通常使用transform

df['Abundance'] = df.groupby(['location','Species']).Species.transform('size')

Output: Output:

   ID location Species  Count  Abundance
0   1        A     Cat      2          2
1   2        A     Cat      2          2
2   3        C     Dog      2          1
3   4        C     Cat      2          1
4   5        E     Cat      4          2
5   6        E     Cat      4          2
6   7        E     Dog      4          1
7   8        E    Bird      4          1
df.groupby(['location','Species']).Species.value_counts().to_frame('Abundance')



                            Abundance
location Species Species           
A        Cat     Cat              2
C        Cat     Cat              1
         Dog     Dog              1
E        Bird    Bird             1
         Cat     Cat              2
         Dog     Dog              1

I believe you should try grouping the data frame using the columns you want to have in the output,我相信您应该尝试使用 output 中的列对数据框进行分组,

>>> df[df.Species.notnull()].groupby(['location','Species']).count()
                  ID  Count
location Species           
A        Cat       2      2
C        Cat       1      1
         Dog       1      1
E        Bird      1      1
         Cat       2      2
         Dog       1      1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM