简体   繁体   中英

How do I return a specific substring within a Pandas dataframe

I have a column of text that I need to find the substring and return the whole word, but can't figure out how to get the entire word.

Each column has text with a coding at the bottom labelled "ATT03", "ATT04" etc and I want to take that ATT and make a new column of each of the labels.

So for example my column looks like this:

blahblahblah text [ATT03]: blahblahblah

blahblahblah text [ATT03]: blahblahblah

blahblahblah text [ATT04]: blahblahbblahblah

blah text [ATT08]: blahblahblah

df_att=(df2.loc[:,'Report Text'].str.split("ATT",1)).str[-1]

I used this to create a new column, but it only splits the data into "ATT08: blahblahblahblah", and I really only want the ATT in between the "[]". I don't need all the extraneous data.

Is there regular expression/code that would return just the ATT03? without the rest of the string around it?

Thank you so much. I've been struggling through this for hours and am frustrated.

You can use the following regular expression:

df_att=(df2.loc[:,'Report Text'].str.extract("\[(ATT[^\]]*)")

It will extract the text between the brackets that you are looking for.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM