Selecting columns based on how many times it repeats

Question

Consider I have a column in python pandas and have 1000 string values, how can I select top 10 out of this, based on how many times it repeat

data['country_state'] = data['place'].str.rsplit(',').str[-1] #column

country_state has 1000 values I have to select top 10 country_state out of 1000 based on how many times the same string repeats

Answer 1

我认为 value_counts ( https://pandas.pydata.org/docs/reference/api/pandas.Series.value_counts.html ) 和 nlargest ( https://pandas.pydata.org/pandas-docs/stable/ reference/api/pandas.Series.nlargest.html ) 应该在这里工作：

data['country_state'].value_counts().nlargest(10)

Answer 2

Hi you can use some pandas functions to solve this problem, first value_counts will sort your data by repetitions and count that and then you can split the first 10 and get their index. Here an example:

import numpy as np
import pandas as pd

#create the dataframe I used numbers for simplicity it's the same for other var
n = np.random.randint(0,50,1000)
df_n = pd.DataFrame(n,columns= ['num'])

#get values by frequency 
nreps = df_n['num'].value_counts()

#get the top ten and print it's index
top10_values = nreps.iloc[:10].index
top10_counts    = nreps.iloc[:10].values

Selecting columns based on how many times it repeats

Question

2 answers

solution1
1 2021-11-02 14:23:59

solution2
0 2021-11-02 14:27:27

Selecting columns based on how many times it repeats

Question

2 answers

solution1 1 2021-11-02 14:23:59

solution2 0 2021-11-02 14:27:27

solution1
1 2021-11-02 14:23:59

solution2
0 2021-11-02 14:27:27