简体   繁体   中英

Extract string using Regex and Python

I have a column in a df contains the following values:

>>> import pandas as pd
>>> df = pd.DataFrame({'Sentence':['This is the results of my experiments KEY_abc_def', 'I have researched the product KEY_abc_def as requested', 'He got the idea from your message KEY_mno_pqr']})
>>> df
                                                 Sentence
0       This is the results of my experiments KEY_abc_def
1  I have researched the product KEY_abc_def as requested
2            e got the idea from your message KEY_mno_pqr

I would like to use regex to extract (or duplicate) the KEY into a new column without the actual "KEY_". The output should be as below:

>>> df
                                                Sentence   KEY
0      This is the results of my experiments KEY_abc_def   abc_def
1  I have researched the product KEY_abc_def as requested  abc_def
2           He got the idea from your message KEY_mno_pqr  mno_pqr

I tried with this code but it is not working. Any suggestions would greatly be appreciated.

 df['KEY']= df.Sentence.str.extract("KEY_", expand=True)

If you only expect word chars, that is letters, digit and underscores use

df['KEY']= df['Sentence'].str.extract(r"KEY_(\w+)", expand=False)

If the KEY_ must a beginning of a word, you should add \b word boundary in front of it: r"\bKEY_(\w+)" .

Since Series.str.extract only returns the captured text if a capturing group is used in the pattern, the regex will only return the part matched with \w+ and \bKEY_ will be matched but discarded from the result.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM