正则表达式获取某些拆分词之间的所有字符

Question

My string contains AND , OR and NOT keywords, each of them is always upper case and pre- and suffixxed with a space.我的字符串包含AND ， OR和NOT关键字，它们中的每一个都是大写的，并且前后缀有空格。

This is my test-string:这是我的测试字符串：

X OR Y OR Z Z AND ZY AND ZZ OR A OR B AND C NOT E NOT F

I would like to get:我想得到：

all blocks connected with AND and separated by either OR , NOT or the beginning/end of the string.所有与AND连接并由OR ， NOT或字符串的开头/结尾分隔的块。 For my example i am looking for ZZ AND ZY AND ZZ as well as B AND C .对于我的示例，我正在寻找ZZ AND ZY AND ZZ以及B AND C 。 This is what i came up with, which returns Z AND ZY AND ZZ instead of ZZ AND ZY AND ZZ because of the \w , but i can not up with any better idea:这就是我想出的，它返回Z AND ZY AND ZZ而不是ZZ AND ZY AND ZZ因为\w ，但我想不出更好的主意：

import re

input_string = "X OR Y OR Z Z AND ZY AND ZZ OR A OR B AND C NOT E NOT F"
and_pairs = re.findall("\w AND .+?(?= OR | NOT )", input_string )

also i would need all terms preceeded by a NOT , as well as all terms followed by an OR in separate lists.我还需要以NOT开头的所有术语，以及单独列表中后跟OR的所有术语。

I dont want to seem lazy, but regex is driving me crazy (unintended rhyme).我不想显得懒惰，但正则表达式让我发疯（无意押韵）。

Answer 1

I think this should do the trick,我认为这应该可以解决问题，

result:结果：

>>> t_string = "X OR Y OR Z Z AND ZY AND ZZ OR A OR B AND C NOT E NOT F"
>>> [item.strip() for sublist in [x.split('NOT') for x in t_string.split('OR')] for item in sublist if 'AND' in item]
['Z Z AND ZY AND ZZ', 'B AND C']

Answer 2

Here's how to find the AND pairs:以下是查找AND对的方法：

import re

input_string = "X OR Y OR Z Z AND ZY AND ZZ OR A OR B AND C NOT E NOT F"
matchRegex = r"(.*?)(?:(?: OR | NOT )(\w+))+?"

regexdata = re.findall(matchRegex, input_string)
regexdata = list(sum(regexdata,())) # flatten matches
print(regexdata)

matches = [""]
for idx, data in enumerate(regexdata): # combine separated matches
        if idx % 2 == 0: matches[-1] += data
        else: matches.append(data)
print(matches)

matches = list(filter(lambda match: "AND" in match, matches)) # 'and' pairs only
print(matches)

Output: Output：

['X', 'Y', '', 'Z', ' Z AND ZY AND ZZ', 'A', '', 'B', ' AND C', 'E', '', 'F']
['X', 'Y', 'Z Z AND ZY AND ZZ', 'A', 'B AND C', 'E', 'F']
['Z Z AND ZY AND ZZ', 'B AND C']

What this does is first it matches with the regex, then it combines the separated regex groups (index 1 and 2 should be combined, 3 and 4, and so on).它的作用是首先与正则表达式匹配，然后组合分离的正则表达式组（索引 1 和 2 应该组合，3 和 4，依此类推）。 Once that is complete, it filter out and outputs only the AND connected parts.完成后，它会过滤掉并仅输出AND连接的部分。 If you don't need that last part you can remove it.如果您不需要最后一部分，则可以将其删除。

Answer 3

try with split尝试拆分

input_string = "X OR Y OR Z Z AND ZY AND ZZ OR A OR B AND C NOT E NOT F"
split_pairs = re.split("( OR | NOT )", input_string)
and_pairs = list(filter(lambda split_pairs: "AND" in split_pairs, split_pairs))
print(and_pairs)

result结果

['ZZ AND ZY AND ZZ', 'B AND C'] ['ZZ 和 ZY 和 ZZ'，'B 和 C']

正则表达式获取某些拆分词之间的所有字符

问题描述

3 个解决方案

解决方案1
0 2022-08-03 15:49:20

解决方案2
0 2022-08-03 15:58:30

解决方案3
0 2022-08-03 16:44:29

正则表达式获取某些拆分词之间的所有字符

问题描述

3 个解决方案

解决方案1 0 2022-08-03 15:49:20

解决方案2 0 2022-08-03 15:58:30

解决方案3 0 2022-08-03 16:44:29

解决方案1
0 2022-08-03 15:49:20

解决方案2
0 2022-08-03 15:58:30

解决方案3
0 2022-08-03 16:44:29