简体   繁体   中英

Change values in Pandas cells based on value_counts() condition

How can I change the values in specific columns in pandas dataframe, based on the condition. This is my dataframe:

import pandas as pd

df = pd.DataFrame({'data':['lemon', 'apple', 'lemon', 'apple', 'apple', 'lemon', 'pear', 'apple', 
                            'pear', 'lemon', 'pear', 'orange', 'banana', 'banana', 'pear']})

     data
0    lemon
1    apple
2    lemon
3    apple
4    apple
5    lemon
6     pear
7    apple
8     pear
9    lemon
10    pear
11  orange
12  banana
13  banana
14    pear

Counting each element:

lemon     4
apple     4
pear      4
banana    2
orange    1
Name: data, dtype: int64

How can I change the value to 'other'if value_counts() result is less than 4? Expected result:

     data
0    lemon
1    apple
2    lemon
3    apple
4    apple
5    lemon
6     pear
7    apple
8     pear
9    lemon
10    pear
11  other
12  other
13  other
14    pear

Use Series.mask with counts values by Series.map with Series.value_counts and test if less like 4 :

df['data'] = df['data'].mask(df['data'].map(df['data'].value_counts()).lt(4), 'other')
#alternative
df['data'] = df['data'].mask(df.groupby('data')['data'].transform('size').lt(4), 'other')
print (df)
     data
0   lemon
1   apple
2   lemon
3   apple
4   apple
5   lemon
6    pear
7   apple
8    pear
9   lemon
10   pear
11  other
12  other
13  other
14   pear

我们可以应用这样的功能。

df['data'] = df['data'].apply(lambda x : 'other' if len(df[df.data==x])<4 else x)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM