使用正則表達式和 Python 提取字符串

Question

我在 df 中有一個列包含以下值：

>>> import pandas as pd
>>> df = pd.DataFrame({'Sentence':['This is the results of my experiments KEY_abc_def', 'I have researched the product KEY_abc_def as requested', 'He got the idea from your message KEY_mno_pqr']})
>>> df
                                                 Sentence
0       This is the results of my experiments KEY_abc_def
1  I have researched the product KEY_abc_def as requested
2            e got the idea from your message KEY_mno_pqr

我想使用正則表達式將 KEY 提取（或復制）到沒有實際“KEY_”的新列中。 output 應如下所示：

>>> df
                                                Sentence   KEY
0      This is the results of my experiments KEY_abc_def   abc_def
1  I have researched the product KEY_abc_def as requested  abc_def
2           He got the idea from your message KEY_mno_pqr  mno_pqr

我嘗試使用此代碼，但它不起作用。 任何建議將不勝感激。

 df['KEY']= df.Sentence.str.extract("KEY_", expand=True)

Answer 1

如果您只期望單詞字符，即字母、數字和下划線使用

df['KEY']= df['Sentence'].str.extract(r"KEY_(\w+)", expand=False)

如果KEY_必須是單詞的開頭，則應在其前面添加\b單詞邊界： r"\bKEY_(\w+)" 。

由於Series.str.extract僅在模式中使用捕獲組時才返回捕獲的文本，因此正則表達式將僅返回與\w+匹配的部分，而\bKEY_將被匹配但從結果中丟棄。

使用正則表達式和 Python 提取字符串

問題描述

1 個解決方案

解決方案1
1 已采納 2020-11-25 17:40:53

使用正則表達式和 Python 提取字符串

問題描述

1 個解決方案

解決方案1 1 已采納 2020-11-25 17:40:53

解決方案1
1 已采納 2020-11-25 17:40:53