[英]regex to get all characters between certain split-words
My string contains AND
, OR
and NOT
keywords, each of them is always upper case and pre- and suffixxed with a space.我的字符串包含
AND
, OR
和NOT
关键字,它们中的每一个都是大写的,并且前后缀有空格。
This is my test-string:这是我的测试字符串:
X OR Y OR Z Z AND ZY AND ZZ OR A OR B AND C NOT E NOT F
I would like to get:我想得到:
AND
and separated by either OR
, NOT
or the beginning/end of the string.AND
连接并由OR
, NOT
或字符串的开头/结尾分隔的块。 For my example i am looking for ZZ AND ZY AND ZZ
as well as B AND C
.ZZ AND ZY AND ZZ
以及B AND C
。 This is what i came up with, which returns Z AND ZY AND ZZ
instead of ZZ AND ZY AND ZZ
because of the \w
, but i can not up with any better idea:Z AND ZY AND ZZ
而不是ZZ AND ZY AND ZZ
因为\w
,但我想不出更好的主意:import re
input_string = "X OR Y OR Z Z AND ZY AND ZZ OR A OR B AND C NOT E NOT F"
and_pairs = re.findall("\w AND .+?(?= OR | NOT )", input_string )
NOT
, as well as all terms followed by an OR
in separate lists.NOT
开头的所有术语,以及单独列表中后跟OR
的所有术语。 I dont want to seem lazy, but regex is driving me crazy (unintended rhyme).我不想显得懒惰,但正则表达式让我发疯(无意押韵)。
I think this should do the trick,我认为这应该可以解决问题,
result:结果:
>>> t_string = "X OR Y OR Z Z AND ZY AND ZZ OR A OR B AND C NOT E NOT F"
>>> [item.strip() for sublist in [x.split('NOT') for x in t_string.split('OR')] for item in sublist if 'AND' in item]
['Z Z AND ZY AND ZZ', 'B AND C']
Here's how to find the AND
pairs:以下是查找
AND
对的方法:
import re
input_string = "X OR Y OR Z Z AND ZY AND ZZ OR A OR B AND C NOT E NOT F"
matchRegex = r"(.*?)(?:(?: OR | NOT )(\w+))+?"
regexdata = re.findall(matchRegex, input_string)
regexdata = list(sum(regexdata,())) # flatten matches
print(regexdata)
matches = [""]
for idx, data in enumerate(regexdata): # combine separated matches
if idx % 2 == 0: matches[-1] += data
else: matches.append(data)
print(matches)
matches = list(filter(lambda match: "AND" in match, matches)) # 'and' pairs only
print(matches)
Output: Output:
['X', 'Y', '', 'Z', ' Z AND ZY AND ZZ', 'A', '', 'B', ' AND C', 'E', '', 'F']
['X', 'Y', 'Z Z AND ZY AND ZZ', 'A', 'B AND C', 'E', 'F']
['Z Z AND ZY AND ZZ', 'B AND C']
What this does is first it matches with the regex, then it combines the separated regex groups (index 1 and 2 should be combined, 3 and 4, and so on).它的作用是首先与正则表达式匹配,然后组合分离的正则表达式组(索引 1 和 2 应该组合,3 和 4,依此类推)。 Once that is complete, it filter out and outputs only the
AND
connected parts.完成后,它会过滤掉并仅输出
AND
连接的部分。 If you don't need that last part you can remove it.如果您不需要最后一部分,则可以将其删除。
try with split尝试拆分
input_string = "X OR Y OR Z Z AND ZY AND ZZ OR A OR B AND C NOT E NOT F"
split_pairs = re.split("( OR | NOT )", input_string)
and_pairs = list(filter(lambda split_pairs: "AND" in split_pairs, split_pairs))
print(and_pairs)
result结果
['ZZ AND ZY AND ZZ', 'B AND C'] ['ZZ 和 ZY 和 ZZ','B 和 C']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.