正则表达式：查找两个特定单词之间的组的所有出现

Question

python version: Python 3.10.4 (main, Mar 31 2022, 08:41:55) [GCC 7.5.0] on linux python版本：Linux 上的 Python 3.10.4（主要，2022 年 3 月 31 日，08:41:55）[GCC 7.5.0]
re version: 2.2.1 re版本：2.2.1

I want to get all occurrences of a regex's group between two specific words.我想在两个特定单词之间获取所有出现的正则表达式组。

First, the different expressions my program will encounter ( ... means this in a vaster text):首先，我的程序将遇到的不同表达式（ ...在更宽的文本中表示这个）：

"...For colts, geldings and fillies of..."
"...For horses, geldings and mares of..."
"...For colts and geldings of..."
"...For colts and fillies of..."
"...For horses and geldings of..."
"...For colts of..."
"...For fillies of..."
"...For horses of..."
"...For mares of..."

The finality of the program is to get all the mentions of "colts", "geldings", "fillies", "horses","mares" between the words "For" and "of".该程序的最终结果是在“For”和“of”这两个词之间获得所有提到的“colts”、“geldings”、“fillies”、“horses”、“mares”。 Concretely I want 3 groups if there is 3 mentions, 2 groups for two mentions, and 1 group for one mention.具体来说，如果有 3 次提及，我想要 3 组，2 次提及需要 2 组，1 次提及需要 1 组。

len(re.search(a_regex_pattern,"...For colts, geldings and fillies of...").groups())
>>> 3 # 3 groups
re.search(a_regex_pattern,"...For colts, geldings and fillies of...").groups()
>>> ['colts','geldings','fillies']

Where I am stuck is to find the right a_regex_pattern to do it.我被困的地方是找到正确的a_regex_pattern来做这件事。

I tried it:我尝试过这个：

a_regex_expression = "For.*?(colts|geldings|fillies){1,3}.*?of"
re.search(a_regex_pattern,"...For colts, geldings and fillies of...").groups()
>>> ['fillies']

Other tries are worse.其他尝试更糟糕。 How would you do it ?你会怎么做？

Answer 1

I'd do it in two steps:我会分两步完成：

in first step I search for everything between For ... of在第一步中，我搜索For ... of之间的所有内容
in second step I extract the words from the first step在第二步中，我从第一步中提取单词

import re

tests = [
    "... For colts, geldings and fillies of ...",
    "... For horses, geldings and mares of ...",
    "... For colts and geldings of ...",
    "... For colts and fillies of ...",
    "... For horses and geldings of ...",
    "... For colts of ...",
    "... For fillies of ...",
    "... For horses of ...",
    "... For mares of ...",
]

pat1 = re.compile(r"\bFor\s+(.*?)\s+of\b")
pat2 = re.compile(r",|\band\b")

for t in tests:
    m = pat1.search(t)
    if m:
        print(pat2.sub(" ", m.group(1)).split())

Prints:印刷：

['colts', 'geldings', 'fillies']
['horses', 'geldings', 'mares']
['colts', 'geldings']
['colts', 'fillies']
['horses', 'geldings']
['colts']
['fillies']
['horses']
['mares']

正则表达式：查找两个特定单词之间的组的所有出现

问题描述

1 个解决方案

解决方案1
1 2022-07-08 21:07:58

正则表达式：查找两个特定单词之间的组的所有出现

问题描述

1 个解决方案

解决方案1 1 2022-07-08 21:07:58

解决方案1
1 2022-07-08 21:07:58