使用正则表达式和 Python 提取字符串

Question

I have a column in a df contains the following values:我在 df 中有一个列包含以下值：

>>> import pandas as pd
>>> df = pd.DataFrame({'Sentence':['This is the results of my experiments KEY_abc_def', 'I have researched the product KEY_abc_def as requested', 'He got the idea from your message KEY_mno_pqr']})
>>> df
                                                 Sentence
0       This is the results of my experiments KEY_abc_def
1  I have researched the product KEY_abc_def as requested
2            e got the idea from your message KEY_mno_pqr

I would like to use regex to extract (or duplicate) the KEY into a new column without the actual "KEY_".我想使用正则表达式将 KEY 提取（或复制）到没有实际“KEY_”的新列中。 The output should be as below: output 应如下所示：

>>> df
                                                Sentence   KEY
0      This is the results of my experiments KEY_abc_def   abc_def
1  I have researched the product KEY_abc_def as requested  abc_def
2           He got the idea from your message KEY_mno_pqr  mno_pqr

I tried with this code but it is not working.我尝试使用此代码，但它不起作用。 Any suggestions would greatly be appreciated.任何建议将不胜感激。

 df['KEY']= df.Sentence.str.extract("KEY_", expand=True)

Answer 1

If you only expect word chars, that is letters, digit and underscores use如果您只期望单词字符，即字母、数字和下划线使用

df['KEY']= df['Sentence'].str.extract(r"KEY_(\w+)", expand=False)

If the KEY_ must a beginning of a word, you should add \b word boundary in front of it: r"\bKEY_(\w+)" .如果KEY_必须是单词的开头，则应在其前面添加\b单词边界： r"\bKEY_(\w+)" 。

Since Series.str.extract only returns the captured text if a capturing group is used in the pattern, the regex will only return the part matched with \w+ and \bKEY_ will be matched but discarded from the result.由于Series.str.extract仅在模式中使用捕获组时才返回捕获的文本，因此正则表达式将仅返回与\w+匹配的部分，而\bKEY_将被匹配但从结果中丢弃。

使用正则表达式和 Python 提取字符串

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-11-25 17:40:53

使用正则表达式和 Python 提取字符串

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-11-25 17:40:53

解决方案1
1 已采纳 2020-11-25 17:40:53