简体   繁体   中英

How to display top count for each column in Python

I have created a DataFrame in Python and would like to display the most popular dog breed per zipcode. I have coded the following but I can only display the total number counted per breed, not the breed itself.

My code:

import pandas as pd

df = pd.DataFrame({'zip_code':[12345,66666,12345,22222,22222,12345,66666,22222,44444],
                   'primary_breed': ['labrador','pug','poodle','labrador','labrador','pug','whippet','poodle','labrador'],
                   'animals_name':['lucy','charley','scout','hank','sweetie','lucy','daddy','lucy','charley'],
                   'species':['dog','dog','dog','dog','dog','dog','dog','dog','dog']})

# assign correct data types
df['species'] = df['species'].astype('category')
df['animals_name'] = df['animals_name'].astype('string')
df['primary_breed'] = df['primary_breed'].astype('category')
df['zip_code'] = df['zip_code'].astype('string')

dogs = df.species == 'dog'

# total number per breed per zip
df_total_per_breed_zip = df[dogs].groupby('zip_code')['primary_breed'].value_counts() 
print('\n\ntotal number per breed: \n', df_total_per_breed_zip)

# most popular breed per zip
df_mostpop_breed_zip = df_total_per_breed_zip.max(level='zip_code')
print('\n\nmost popular breed per zip: \n', df_mostpop_breed_zip)

So what I am getting is:

total number per breed: 
 zip_code  primary_breed
12345     labrador         1
          poodle           1
          pug              1
22222     labrador         2
          poodle           1
44444     labrador         1
66666     pug              1
          whippet          1
Name: primary_breed, dtype: int64

most popular breed per zip: 
 zip_code
12345    1
22222    2
44444    1
66666    1
Name: primary_breed, dtype: int64

But what I would like to get is:

total number per breed: 
 zip_code  primary_breed
12345     labrador         1
          poodle           1
          pug              1
22222     labrador         2
          poodle           1
44444     labrador         1
66666     pug              1
          whippet          1
Name: primary_breed, dtype: int64

most popular breed per zip: 
 zip_code
12345    labrador
22222    labrador
44444    labrador
66666    pug
Name: primary_breed, dtype: int64

Use mode for the most common:

(df.loc[df['species']=='dog']
   .groupby('zip_code')['primary_breed']
   .agg(lambda x: x.mode()[0])
)

Output:

zip_code
12345    labrador
22222    labrador
44444    labrador
66666         pug
Name: primary_breed, dtype: object

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM