Python 正則表達式用於匹配兩個子字符串之間的任意數量的元素？

Question

我正在嘗試編寫一個正則表達式來查找起始標記（'MS' 或 'PhD'）和結束標記（'.' 或 '.'）之間的所有字符，這很棘手是因為它對兩者都很常見開始標記出現在我的文本數據中。 我只對最后一個起始標記和第一個結束標記所界定的字符感興趣。 （以及所有此類事件。）

start = 'MS|PhD'
end = '.|!'

input1 = "Candidate with MS or PhD in Statistics, Computer Science, or similar field."
output1 = "in Statistics, Computer Science, or similar field"

input2 = "Applicant with MS in Biology or Chemistry desired."
output2 = "in Biology or Chemistry desired"

這是我最好的嘗試，目前返回一個空列表：

#          start  any char    end
pattern = r'^(MS|PhD) .* (\.|!)$'
re.findall(pattern,"candidate with MS in Chemistry.")

>>>
[]

有人能指出我正確的方向嗎？

Answer 1

您可以使用捕獲組並匹配 MS 或 PhD 和。 或者。 組外。

\b(?:MS|PhD)\s*((?:(?!\b(?:MS|PhD)\b).)*)[.,]

\b(?:MS|PhD)\s*一個單詞邊界，匹配 MS 或 phD 后跟 0+ 前導 whitspace 字符，以不在組中捕獲它們
(捕獲組 1 ，其中包含所需的值
- (?:非捕獲組
  - (??\b(:.MS|PhD)\b). 如果后面沒有 MS 或 phD，則匹配除換行符以外的任何字符
- )*關閉非捕獲組並重復 0+ 次
)[.,]關閉第 1 組並匹配. 或,

正則表達式演示| Python 演示

import re

regex = r"\b(?:MS|PhD)\s*((?:(?!\b(?:MS|PhD)\b).)*)[.,]"
s = ("Candidate with MS or PhD in Statistics, Computer Science, or similar field.\n"
    "Applicant with MS in Biology or Chemistry desired.")

matches = re.findall(regex, s)
print(matches)

Output

['in Statistics, Computer Science, or similar field', 'in Biology or Chemistry desired']

Python 正則表達式用於匹配兩個子字符串之間的任意數量的元素？

問題描述

1 個解決方案

解決方案1
2 已采納 2020-12-21 18:08:10

Python 正則表達式用於匹配兩個子字符串之間的任意數量的元素？

問題描述

1 個解決方案

解決方案1 2 已采納 2020-12-21 18:08:10

解決方案1
2 已采納 2020-12-21 18:08:10