How do I count only unique values with groupby using pandas/python?

Question

What can I do to this pandas dataframe to get it to count only the unique/distinct values of "Unique_Id"? Everything I have tried gives me unique values of community instead, or throws an error.

df.groupby("Community")["Unique_Id"].count().sort_values(ascending = False)

This is the output I get:

Comunidad_Autónoma
Cataluña                534415
Comunidad Valenciana    475411
Madrid                  415047
Islas Canarias          171939
País Vasco              168297
Navarra                  57045
La Rioja                 26057
Name: Unique_Id, dtype: int64

Answer 1

One possible option is to use pandas.DataFrame.drop_duplicates before you call the groupby method. In the example below, Madrid has a duplicate Id:

import pandas as pd

df = pd.DataFrame(dict(
    Community = 'Cataluña,Madrid,Cataluña,Madrid,Cataluña,Madrid'.split(','),
    Unique_Id = [1, 2, 3, 4, 5, 2],
))

df1 = df.drop_duplicates(
        ['Community','Unique_Id']
    ).groupby(
        'Community'
    )['Unique_Id'].count().sort_values(ascending = False)

print(df1)
print(f'\nTotal Unique_Ids Across All Communities: {sum(df1.values)}')

Example Code In Python Tutor

How do I count only unique values with groupby using pandas/python?

Question

1 answers

solution1
0 2020-03-20 05:57:38

How do I count only unique values with groupby using pandas/python?

Question

1 answers

solution1 0 2020-03-20 05:57:38

solution1
0 2020-03-20 05:57:38