I have a column in a df contains the following values:
>>> import pandas as pd
>>> df = pd.DataFrame({'Sentence':['his is the results of my experiments KEY_abc_def KEY_mno_pqr KEY_blt_chm', 'I have researched the product KEY_abc_def, and KEY_blt_chm as requested', 'He got the idea from your message KEY_mno_pqr']})
>>> df
Sentence
0 This is the results of my experiments KEY_abc_def KEY_mno_pqr KEY_blt_chm
1 I have researched the product KEY_abc_def, and KEY_blt_chm as requested
2 He got the idea from your message KEY_mno_pqr
I would like to use regex to extract the KEY into a new column without the actual "KEY_". For those sentences have more than 1 KEY, they should be joined with a comma. The output should be as below:
>>> df
Sentence KEY
0 This is the results of my experiments KEY_abc_def KEY_mno_pqr KEY_blt_chm abc_def, mno_pqr, blt_chm
1 I have researched the product KEY_abc_def, and KEY_blt_chm as requested abc_def, blt_chm
2 He got the idea from your message KEY_mno_pqr mno_pqr
I tried with this code but it is not working. Any suggestions would greatly be appreciated.
The code that I currently have only worked with the first KEY, and ignored the rest. I'm new with regex so any suggestions would be highly appreciated.
df['KEY']= df.sentence.str.extract("KEY_(\w+)", expand=True)
Use
df['KEY']= df.sentence.str.findall("KEY_(\w+)").str.join(",")
The Series.str.findall
finds all occurrences of the captured substring and str.join(",")
joins the results into a comma-separated string value.
Pandas test:
>>> df['KEY']= df['Sentence'].str.findall("KEY_(\w+)").str.join(",")
>>> df
Sentence KEY
0 his is the results of my experiments KEY_abc_def KEY_mno_pqr KEY_blt_chm abc_def,mno_pqr,blt_chm
1 I have researched the product KEY_abc_def, and KEY_blt_chm as requested abc_def,blt_chm
2 He got the idea from your message KEY_mno_pqr mno_pqr
(Note in case you did not know that: I used pd.set_option('display.max_colwidth', None)
to display all the data in the columns, see How to display full (non-truncated) dataframe information in html when converting from pandas dataframe to html? ).
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.