Check substring in dataframe column by referencing string position

Question

I have a dataframe with a column called 'id', where each id is 9 characters long, and I'm trying to add a column 'Rating' where I rate each row as A, AA, or AAA based on whether the 6th, 7th and 8th characters are 'A00', 'AA0' or '000'. I've got the following code so far:

id = df['id']
conditions = [(id.str.get(5) == 'A00'), (id.str.get(5) == 'AA0'), (id.str.get(5) == '000')]
values = ['A', 'AA', 'AAA']
df['Rating'] = np.select(conditions, values)
df['Rating'] = df['Rating'].astype('category')

But I know the conditions line is wrong because the column.str.get(n) only checks the (n-1)th character of the string and I need the substring of length 3 rather than just a single character. Does anyone know which command I can use?

Thanks in advance!

Answer 1

Using replace() with value mappings.:

df.id.str[-4:-1].replace({'A00': 'A', 'AA0': 'AA', '000': 'AAA'})

Answer 2

Just extract it using a regular expression.

df.id.str.extract(r'.{5}([A]+)').fillna('AAA')

Example

df = pd.DataFrame({'id': ['12345A003', '12345AA03', '123450003']}) 
df.id.str.extract(r'.{5}([A]+)').fillna('AAA')

Output

     0
0    A
1   AA
2  AAA

Check substring in dataframe column by referencing string position

Question

2 answers

solution1
0 ACCPTED 2020-07-19 14:13:00

solution2
0 2020-07-19 14:25:10

Check substring in dataframe column by referencing string position

Question

2 answers

solution1 0 ACCPTED 2020-07-19 14:13:00

solution2 0 2020-07-19 14:25:10

solution1
0 ACCPTED 2020-07-19 14:13:00

solution2
0 2020-07-19 14:25:10