简体   繁体   中英

Pandas: Replace multiple column values by unique value

I have a pandas DataFrame with many "object" columns where each of them contains many values (modalities). Then, I want to keep only the 10 most frequent modalities for each column and the others replace by 'Oth'.

For example, if I have a column 'obj_col1' which contains 4 different values:

obj_col1
'A'
'A'
'B'
'C'
'B'
'D'

and I want to keep 2 the most frequent, here 'A' and 'B', and replace the rest by 'Oth':

obj_col2
'A'
'A'
'B'
'Oth'
'B'
'Oth'

A piece of code for one object column (categorical variable) is:

# sorted list of modalities of 'categ_var' 
list_freq_modal = df['categ_var'].value_counts().index.tolist()
# replace all the modalities except the first 10 by 'Oth'
df['categ_var'].replace(list_freq_modal[10:],'Oth', inplace=True)

But I have an error : 'NoneType' object has no attribute 'any'

Have you any idea have implement it in more optimal way ?

Instead of replace we can use value_counts.head(2) and where by mapping value_counts and getting the mask with notnull() ie

x = df['obj_col1'].value_counts().head(2)
#B    2
#A    2
#Name: obj_col1, dtype: int64

df['obj_col1'].where(df['obj_col1'].map(x).notnull(),'Oth')

Output :

0      A
1      A
2      B
3    Oth
4      B
5    Oth
Name: obj_col1, dtype: object
df['obj_col1'].map(x).notnull() # This will give the mask. 
0     True
1     True
2     True
3    False
4     True
5    False
Name: obj_col1, dtype: bool

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM