简体   繁体   English

Python-获取字符串中当前和下一个模式出现之间的内容

[英]Python-Getting contents between current and next occurrence of pattern in a string

I want to implement the following in python我想在python中实现以下内容

(1)Search pattern in a string (1)在字符串中搜索模式

(2)Get content till next occurence of the same pattern in the same string (2)获取内容直到同一字符串中下一次出现相同模式

Till end of the string do (1) and (2)直到字符串结束执行 (1) 和 (2)

Searched all available answers but of no use.搜索了所有可用的答案,但没有用。

Thanks in advance.提前致谢。

You can use something like this你可以使用这样的东西

re.findall(r"pattern.*?(?=pattern|$)",test_Str)

Here we search pattern and with lookahead make sure it captures till next pattern or end of string .在这里,我们搜索pattern并使用lookahead确保它捕获到下一个patternend of string

As mentioned by Blckknght in the comment, you can achieve this with re.split .正如Blckknght在评论中提到的,您可以使用re.split实现这re.split re.split retains all empty strings between a) the beginning of the string and the first match, b) the last match and the end of the string and c) between different matches: re.split保留 a) 字符串开头和第一个匹配项之间的所有空字符串,b) 最后一个匹配项和字符串结尾以及 c) 不同匹配项之间的所有空字符串:

>>> re.split('abc', 'abcabcabcabc')
['', '', '', '', '']
>>> re.split('bca', 'abcabcabcabc')
['a', '', '', 'bc']
>>> re.split('c', 'abcabcabcabc')
['ab', 'ab', 'ab', 'ab', '']
>>> re.split('a', 'abcabcabcabc')
['', 'bc', 'bc', 'bc', 'bc']

If you want to retain only c) the strings between 2 matches of the pattern, just slice the resulting array with [1:-1] .如果您只想保留 c) 模式的 2 个匹配项之间的字符串,只需使用[1:-1]对结果数组进行切片。

Do note that there are two caveat with this method:请注意,此方法有两个警告:

  1. re.split doesn't split on empty string match. re.split不会在空字符串匹配时拆分。

     >>> re.split('', 'abcabc') ['abcabc']
  2. Content in capturing groups will be included in the resulting array.捕获组中的内容将包含在结果数组中。

     >>> re.split(r'(.)(?!\\1)', 'aaaaaakkkkkkbbbbbsssss') ['aaaaa', 'a', 'kkkkk', 'k', 'bbbb', 'b', 'ssss', 's', '']

You have to write your own function with finditer if you need to handle those use cases.如果您需要处理这些用例,您必须使用finditer编写自己的函数。

This is the variant where only case c) is matched.这是仅匹配案例 c) 的变体。

def findbetween(pattern, input):
    out = []
    start = 0
    for m in re.finditer(pattern, input):
        out.append(input[start:m.start()])
        start = m.end()
    return out

Sample run:示例运行:

>>> findbetween('abc', 'abcabcabcabc')
['', '', '']
>>> findbetween(r'', 'abcdef')
['a', 'b', 'c', 'd', 'e', 'f']
>>> findbetween(r'ab', 'abcabcabc')
['c', 'c']
>>> findbetween(r'b', 'abcabcabc')
['ca', 'ca']
>>> findbetween(r'(?<=(.))(?!\1)', 'aaaaaaaaaaaabbbbbbbbbbbbkkkkkkk')
['bbbbbbbbbbbb', 'kkkkkkk']

(In the last example, (?<=(.))(?!\\1) matches the empty string at the end of the string, so 'kkkkkkk' is included in the list of results) (在最后一个示例中, (?<=(.))(?!\\1)匹配字符串末尾的空字符串,因此结果列表中包含'kkkkkkk'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM