I have a data frame df_train which has a column sub_division.
The values in the column is look like below
ABC_commercial,
ABC_Private,
Test ROM DIV,
ROM DIV,
TEST SEC R&OM
I am trying to 1. convert anything starts with ABC* to a number (for ex: 1) 2. convert anything contains ROM and R&OM to a number (for ex: 2)
Thanks in advance.
Expected result:
1,
1,
2,
2,
2
Use numpy.select
with Series.str.startswith
and Series.str.contains
:
m1 = df['col'].str.startswith('ABC')
m2 = df['col'].str.contains('ROM|R&OM')
df['new'] = np.select([m1, m2], [1,2], default='no match')
#if need all numbers
#df['new'] = np.select([m1, m2], [1,2], default=0)
print (df)
col new
0 ABC_commercial, 1
1 ABC_Private, 1
2 Test ROM DIV, 2
3 ROM DIV, 2
4 TEST SEC R&OM 2
You can do something like below. Remember you will get NaN
if there is no match. You can add else
case in the converter
function to get default value.
def converter(v):
if v.startswith('ABC'):
return 1
elif any(i in v for i in ['ROM', 'R&OM']):
return 2
df['sub_division'] = df['sub_division'].apply(converter)
print(df.head(10))
output:
sub_division
0 1
1 1
2 2
3 2
4 2
You can use:
df.loc[df['col'].str.startswith('ABC'), 'col'] = 1
df.loc[df['col'].str.contains(r'ROM|R&OM', na=False), 'col'] = 2
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.