简体   繁体   English

正则表达式获取某些拆分词之间的所有字符

[英]regex to get all characters between certain split-words

My string contains AND , OR and NOT keywords, each of them is always upper case and pre- and suffixxed with a space.我的字符串包含ANDORNOT关键字,它们中的每一个都是大写的,并且前后缀有空格。

This is my test-string:这是我的测试字符串:

X OR Y OR Z Z AND ZY AND ZZ OR A OR B AND C NOT E NOT F

I would like to get:我想得到:

  • all blocks connected with AND and separated by either OR , NOT or the beginning/end of the string.所有与AND连接并由ORNOT或字符串的开头/结尾分隔的块。 For my example i am looking for ZZ AND ZY AND ZZ as well as B AND C .对于我的示例,我正在寻找ZZ AND ZY AND ZZ以及B AND C This is what i came up with, which returns Z AND ZY AND ZZ instead of ZZ AND ZY AND ZZ because of the \w , but i can not up with any better idea:这就是我想出的,它返回Z AND ZY AND ZZ而不是ZZ AND ZY AND ZZ因为\w ,但我想不出更好的主意:
import re

input_string = "X OR Y OR Z Z AND ZY AND ZZ OR A OR B AND C NOT E NOT F"
and_pairs = re.findall("\w AND .+?(?= OR | NOT )", input_string )
  • also i would need all terms preceeded by a NOT , as well as all terms followed by an OR in separate lists.我还需要以NOT开头的所有术语,以及单独列表中后跟OR的所有术语。

I dont want to seem lazy, but regex is driving me crazy (unintended rhyme).我不想显得懒惰,但正则表达式让我发疯(无意押韵)。

I think this should do the trick,我认为这应该可以解决问题,

result:结果:

>>> t_string = "X OR Y OR Z Z AND ZY AND ZZ OR A OR B AND C NOT E NOT F"
>>> [item.strip() for sublist in [x.split('NOT') for x in t_string.split('OR')] for item in sublist if 'AND' in item]
['Z Z AND ZY AND ZZ', 'B AND C']

Here's how to find the AND pairs:以下是查找AND对的方法:

import re

input_string = "X OR Y OR Z Z AND ZY AND ZZ OR A OR B AND C NOT E NOT F"
matchRegex = r"(.*?)(?:(?: OR | NOT )(\w+))+?"

regexdata = re.findall(matchRegex, input_string)
regexdata = list(sum(regexdata,())) # flatten matches
print(regexdata)

matches = [""]
for idx, data in enumerate(regexdata): # combine separated matches
        if idx % 2 == 0: matches[-1] += data
        else: matches.append(data)
print(matches)

matches = list(filter(lambda match: "AND" in match, matches)) # 'and' pairs only
print(matches)

Output: Output:

['X', 'Y', '', 'Z', ' Z AND ZY AND ZZ', 'A', '', 'B', ' AND C', 'E', '', 'F']
['X', 'Y', 'Z Z AND ZY AND ZZ', 'A', 'B AND C', 'E', 'F']
['Z Z AND ZY AND ZZ', 'B AND C']

What this does is first it matches with the regex, then it combines the separated regex groups (index 1 and 2 should be combined, 3 and 4, and so on).它的作用是首先与正则表达式匹配,然后组合分离的正则表达式组(索引 1 和 2 应该组合,3 和 4,依此类推)。 Once that is complete, it filter out and outputs only the AND connected parts.完成后,它会过滤掉并仅输出AND连接的部分。 If you don't need that last part you can remove it.如果您不需要最后一部分,则可以将其删除。

try with split尝试拆分

input_string = "X OR Y OR Z Z AND ZY AND ZZ OR A OR B AND C NOT E NOT F"
split_pairs = re.split("( OR | NOT )", input_string)
and_pairs = list(filter(lambda split_pairs: "AND" in split_pairs, split_pairs))
print(and_pairs)

result结果

['ZZ AND ZY AND ZZ', 'B AND C'] ['ZZ 和 ZY 和 ZZ','B 和 C']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM