How do I return a specific substring within a Pandas dataframe

Question

I have a column of text that I need to find the substring and return the whole word, but can't figure out how to get the entire word.

Each column has text with a coding at the bottom labelled "ATT03", "ATT04" etc and I want to take that ATT and make a new column of each of the labels.

So for example my column looks like this:

blahblahblah text [ATT03]: blahblahblah

blahblahblah text [ATT04]: blahblahbblahblah

blah text [ATT08]: blahblahblah

df_att=(df2.loc[:,'Report Text'].str.split("ATT",1)).str[-1]

I used this to create a new column, but it only splits the data into "ATT08: blahblahblahblah", and I really only want the ATT in between the "[]". I don't need all the extraneous data.

Is there regular expression/code that would return just the ATT03? without the rest of the string around it?

Thank you so much. I've been struggling through this for hours and am frustrated.

Answer 1

You can use the following regular expression:

df_att=(df2.loc[:,'Report Text'].str.extract("\[(ATT[^\]]*)")

It will extract the text between the brackets that you are looking for.

How do I return a specific substring within a Pandas dataframe

Question

1 answers

solution1
0 2022-12-19 11:28:59

How do I return a specific substring within a Pandas dataframe

Question

1 answers

solution1 0 2022-12-19 11:28:59

solution1
0 2022-12-19 11:28:59