简体   繁体   中英

Check substring in dataframe column by referencing string position

I have a dataframe with a column called 'id', where each id is 9 characters long, and I'm trying to add a column 'Rating' where I rate each row as A, AA, or AAA based on whether the 6th, 7th and 8th characters are 'A00', 'AA0' or '000'. I've got the following code so far:

id = df['id']
conditions = [(id.str.get(5) == 'A00'), (id.str.get(5) == 'AA0'), (id.str.get(5) == '000')]
values = ['A', 'AA', 'AAA']
df['Rating'] = np.select(conditions, values)
df['Rating'] = df['Rating'].astype('category')

But I know the conditions line is wrong because the column.str.get(n) only checks the (n-1)th character of the string and I need the substring of length 3 rather than just a single character. Does anyone know which command I can use?

Thanks in advance!

Using replace() with value mappings.:

df.id.str[-4:-1].replace({'A00': 'A', 'AA0': 'AA', '000': 'AAA'}) 

Just extract it using a regular expression.

df.id.str.extract(r'.{5}([A]+)').fillna('AAA') 

Example

df = pd.DataFrame({'id': ['12345A003', '12345AA03', '123450003']}) 
df.id.str.extract(r'.{5}([A]+)').fillna('AAA')  

Output

     0
0    A
1   AA
2  AAA

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM