简体   繁体   中英

Selecting columns based on how many times it repeats

Consider I have a column in python pandas and have 1000 string values, how can I select top 10 out of this, based on how many times it repeat

data['country_state'] = data['place'].str.rsplit(',').str[-1] #column 

country_state has 1000 values I have to select top 10 country_state out of 1000 based on how many times the same string repeats

我认为 value_counts ( https://pandas.pydata.org/docs/reference/api/pandas.Series.value_counts.html ) 和 nlargest ( https://pandas.pydata.org/pandas-docs/stable/ reference/api/pandas.Series.nlargest.html ) 应该在这里工作:

data['country_state'].value_counts().nlargest(10)

Hi you can use some pandas functions to solve this problem, first value_counts will sort your data by repetitions and count that and then you can split the first 10 and get their index. Here an example:

import numpy as np
import pandas as pd

#create the dataframe I used numbers for simplicity it's the same for other var
n = np.random.randint(0,50,1000)
df_n = pd.DataFrame(n,columns= ['num'])

#get values by frequency 
nreps = df_n['num'].value_counts()

#get the top ten and print it's index
top10_values = nreps.iloc[:10].index
top10_counts    = nreps.iloc[:10].values

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM