简体   繁体   中英

counting unique values using .groupby in pandas dataframe

I have a dataframe and I when I run my code it returns all Nan's instead of returning the counted value. I'm sure it's something simple but I can't figure it out. I'm trying to get a unique number of species in each location. I'd like the new column to output a count of species [2,2,1,1,2,2,1,1]

import pandas as pd

df = pd.DataFrame({
         'ID': [1, 2, 3, 4, 5, 6, 7, 8],
         'location': ['A', 'A', 'C', 'C', 'E', 'E', 'E', 'E'],
         'Species': ['Cat', 'Cat', 'Dog', 'Cat', 'Cat', 'Cat', 'Dog', 'Bird'],
         'Count': [2,2,2,2,4,4,4,4]
})

def abundance(data):
    data["Abundance"] = data[data.Species.notnull()].groupby('location')['Species'].unique()

abundance(df)
print(df)
````````````````````
   ID location Species  Count Abundance
0   1        A     Cat      2       NaN
1   2        A     Cat      2       NaN
2   3        C     Dog      2       NaN
3   4        C     Cat      2       NaN
4   5        E     Cat      4       NaN
5   6        E     Cat      4       NaN
6   7        E     Dog      4       NaN
7   8        E    Bird      4       NaN

I believe you want count of each pair location, Species . And also, to assign groupby output back to the original dataframe, we usually use transform :

df['Abundance'] = df.groupby(['location','Species']).Species.transform('size')

Output:

   ID location Species  Count  Abundance
0   1        A     Cat      2          2
1   2        A     Cat      2          2
2   3        C     Dog      2          1
3   4        C     Cat      2          1
4   5        E     Cat      4          2
5   6        E     Cat      4          2
6   7        E     Dog      4          1
7   8        E    Bird      4          1
df.groupby(['location','Species']).Species.value_counts().to_frame('Abundance')



                            Abundance
location Species Species           
A        Cat     Cat              2
C        Cat     Cat              1
         Dog     Dog              1
E        Bird    Bird             1
         Cat     Cat              2
         Dog     Dog              1

I believe you should try grouping the data frame using the columns you want to have in the output,

>>> df[df.Species.notnull()].groupby(['location','Species']).count()
                  ID  Count
location Species           
A        Cat       2      2
C        Cat       1      1
         Dog       1      1
E        Bird      1      1
         Cat       2      2
         Dog       1      1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM