使用正则表达式按照特定模式提取多个字符串

Question

I have a long string like this and I want to extract all items after Invalid items , so I expect regex returns a list like ['abc.def.com', 'bar123', 'hello', 'world', '1212', '5566', 'aaaa']我有一个像这样的长字符串，我想提取Invalid items之后的所有项目，所以我希望正则表达式返回一个列表，如['abc.def.com', 'bar123', 'hello', 'world', '1212', '5566', 'aaaa']

I tried using this pattern but it gives me one group per match我尝试使用这种模式，但每场比赛给我一组

import re
test = 'Valid items: (aaa.com; bbb.com); Invalid items: (abc.def.com;); Valid items: (foo123;); Invalid items: (bar123;); Valid items: (1234; 5678; abcd;); Invalid items: (hello; world; 1212; 5566; aaaa;)'
re.findall(r'Invalid items: \((.+?);\)', test)
# ['abc.def.com', 'bar123', 'hello; world; 1212; 5566; aaaa']

Is there a better way to do this with regex?有没有更好的方法用正则表达式来做到这一点？

thanks谢谢

Answer 1

If you want to return all the matches individually using only a single findall , then you'll need to make use of positive lookbehind, eg (?<=foo) .如果您想仅使用一个findall单独返回所有匹配项，那么您需要使用积极的后视，例如(?<=foo) 。 Python module re unfortunately only supports fixed-width lookbehind.不幸的是， re模块仅支持固定宽度的后视。 However, if you're willing to use the outstanding regex module, then it can be done.但是，如果您愿意使用出色的正则表达式模块，那么它可以完成。

Regex:正则表达式：

(?<=Invalid items: \([^)]*)[^ ;)]+

Demonstration: https://regex101.com/r/p90Z81/1演示： https://regex101.com/r/p90Z81/1

If there can be empty items, a small modification to the regex allows capture of these zero-width matches, as follows:如果可能有空项，则对正则表达式稍作修改即可捕获这些零宽度匹配项，如下所示：

(?<=Invalid items: \([^)]*)(?:[^ ;)]+|(?<=\(| ))

Answer 2

Using re , you can split the matched groups on a semicolon and a space使用re ，您可以将匹配的组拆分为分号和空格

import re
test = 'Valid items: (aaa.com; bbb.com); Invalid items: (abc.def.com;); Valid items: (foo123;); Invalid items: (bar123;); Valid items: (1234; 5678; abcd;); Invalid items: (hello; world; 1212; 5566; aaaa;)'
results = []
for s in re.findall(r'Invalid items: \((.+?);\)', test):
     results = results + s.split(r"; ")

print(results)

Output Output

['abc.def.com', 'bar123', 'hello', 'world', '1212', '5566', 'aaaa']

See a Python demo .请参阅Python 演示。

Answer 3

This will pick only the desired pattern that is mentioned in valid or invalid这将仅选择有效或无效中提到的所需模式

import re
test = 'Valid items: (abc.h; bac.h); Invalid items: (aaa.123;); Valid items: (aaa H;bbbb H;); Invalid items: (abc;bac;)'
results = []
for s in re.findall(r'Invalid items: \((.+?);\)', test):
     results = results + s.split(r" ; ")
 
print(results)

使用正则表达式按照特定模式提取多个字符串

问题描述

2 个解决方案

解决方案1
2 已采纳 2021-02-20 03:48:22

解决方案2
1 2021-02-20 09:06:17

解决方案3
0 2022-12-30 02:50:58

使用正则表达式按照特定模式提取多个字符串

问题描述

2 个解决方案

解决方案1 2 已采纳 2021-02-20 03:48:22

解决方案2 1 2021-02-20 09:06:17

解决方案3 0 2022-12-30 02:50:58

解决方案1
2 已采纳 2021-02-20 03:48:22

解决方案2
1 2021-02-20 09:06:17

解决方案3
0 2022-12-30 02:50:58