[英]Python-Getting contents between current and next occurrence of pattern in a string
I want to implement the following in python我想在python中实现以下内容
(1)Search pattern in a string (1)在字符串中搜索模式
(2)Get content till next occurence of the same pattern in the same string (2)获取内容直到同一字符串中下一次出现相同模式
Till end of the string do (1) and (2)直到字符串结束执行 (1) 和 (2)
Searched all available answers but of no use.搜索了所有可用的答案,但没有用。
Thanks in advance.提前致谢。
You can use something like this你可以使用这样的东西
re.findall(r"pattern.*?(?=pattern|$)",test_Str)
Here we search pattern
and with lookahead
make sure it captures till next pattern
or end of string
.在这里,我们搜索
pattern
并使用lookahead
确保它捕获到下一个pattern
或end of string
。
As mentioned by Blckknght in the comment, you can achieve this with re.split
.正如Blckknght在评论中提到的,您可以使用
re.split
实现这re.split
。 re.split
retains all empty strings between a) the beginning of the string and the first match, b) the last match and the end of the string and c) between different matches: re.split
保留 a) 字符串开头和第一个匹配项之间的所有空字符串,b) 最后一个匹配项和字符串结尾以及 c) 不同匹配项之间的所有空字符串:
>>> re.split('abc', 'abcabcabcabc')
['', '', '', '', '']
>>> re.split('bca', 'abcabcabcabc')
['a', '', '', 'bc']
>>> re.split('c', 'abcabcabcabc')
['ab', 'ab', 'ab', 'ab', '']
>>> re.split('a', 'abcabcabcabc')
['', 'bc', 'bc', 'bc', 'bc']
If you want to retain only c) the strings between 2 matches of the pattern, just slice the resulting array with [1:-1]
.如果您只想保留 c) 模式的 2 个匹配项之间的字符串,只需使用
[1:-1]
对结果数组进行切片。
Do note that there are two caveat with this method:请注意,此方法有两个警告:
re.split
doesn't split on empty string match. re.split
不会在空字符串匹配时拆分。
>>> re.split('', 'abcabc') ['abcabc']
Content in capturing groups will be included in the resulting array.捕获组中的内容将包含在结果数组中。
>>> re.split(r'(.)(?!\\1)', 'aaaaaakkkkkkbbbbbsssss') ['aaaaa', 'a', 'kkkkk', 'k', 'bbbb', 'b', 'ssss', 's', '']
You have to write your own function with finditer
if you need to handle those use cases.如果您需要处理这些用例,您必须使用
finditer
编写自己的函数。
This is the variant where only case c) is matched.这是仅匹配案例 c) 的变体。
def findbetween(pattern, input):
out = []
start = 0
for m in re.finditer(pattern, input):
out.append(input[start:m.start()])
start = m.end()
return out
Sample run:示例运行:
>>> findbetween('abc', 'abcabcabcabc')
['', '', '']
>>> findbetween(r'', 'abcdef')
['a', 'b', 'c', 'd', 'e', 'f']
>>> findbetween(r'ab', 'abcabcabc')
['c', 'c']
>>> findbetween(r'b', 'abcabcabc')
['ca', 'ca']
>>> findbetween(r'(?<=(.))(?!\1)', 'aaaaaaaaaaaabbbbbbbbbbbbkkkkkkk')
['bbbbbbbbbbbb', 'kkkkkkk']
(In the last example, (?<=(.))(?!\\1)
matches the empty string at the end of the string, so 'kkkkkkk'
is included in the list of results) (在最后一个示例中,
(?<=(.))(?!\\1)
匹配字符串末尾的空字符串,因此结果列表中包含'kkkkkkk'
)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.