How can I change the values in specific columns in pandas dataframe, based on the condition. This is my dataframe:
import pandas as pd
df = pd.DataFrame({'data':['lemon', 'apple', 'lemon', 'apple', 'apple', 'lemon', 'pear', 'apple',
'pear', 'lemon', 'pear', 'orange', 'banana', 'banana', 'pear']})
data
0 lemon
1 apple
2 lemon
3 apple
4 apple
5 lemon
6 pear
7 apple
8 pear
9 lemon
10 pear
11 orange
12 banana
13 banana
14 pear
Counting each element:
lemon 4
apple 4
pear 4
banana 2
orange 1
Name: data, dtype: int64
How can I change the value to 'other'if value_counts() result is less than 4? Expected result:
data
0 lemon
1 apple
2 lemon
3 apple
4 apple
5 lemon
6 pear
7 apple
8 pear
9 lemon
10 pear
11 other
12 other
13 other
14 pear
Use Series.mask
with counts values by Series.map
with Series.value_counts
and test if less like 4
:
df['data'] = df['data'].mask(df['data'].map(df['data'].value_counts()).lt(4), 'other')
#alternative
df['data'] = df['data'].mask(df.groupby('data')['data'].transform('size').lt(4), 'other')
print (df)
data
0 lemon
1 apple
2 lemon
3 apple
4 apple
5 lemon
6 pear
7 apple
8 pear
9 lemon
10 pear
11 other
12 other
13 other
14 pear
我们可以应用这样的功能。
df['data'] = df['data'].apply(lambda x : 'other' if len(df[df.data==x])<4 else x)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.