简体   繁体   English

Python 正则表达式,用于匹配管道字符内的所有数字

[英]Python Regex for matching all numbers inside pipes characters

Given the following string:给定以下字符串:

string = "123|1*[123;abc;3;52m;|0|62|0|0|0|12|,399|;abc"

I want to match all the numbers inside a pair of pipes chars.我想匹配一对管道字符中的所有数字。
So in that case I want the final value of matches equal to [0, 62, 0, 0, 0, 12] :所以在那种情况下,我希望matches的最终值等于[0, 62, 0, 0, 0, 12]

So far I tried the following regex that only return [0, 0, 0] :到目前为止,我尝试了以下仅返回[0, 0, 0]的正则表达式:

matches = re.findall("\|(\d+)\|", string)

If I replace + with {1,} , it'll keep returning only [0, 0, 0] , but when I replace + with {2,} it return [62, 12] .如果我用{1,}替换+ ,它只会返回[0, 0, 0] ,但是当我用{2,}替换+时,它会返回[62, 12]

So I don't really understand what I'm doing wrong, thanks for the help所以我真的不明白我做错了什么,谢谢你的帮助

The problem is that once your expression matches |0|, it cannot match the same closing |问题是一旦你的表达式匹配 |0|,它就不能匹配相同的结束 | as the opening |作为开幕| for the next number.为下一个号码。

Try using this regular expression - '\|(\d+)(?=\|)' .尝试使用这个正则表达式 - '\|(\d+)(?=\|)' Here, the '(?=...)' part is called a positive lookahead.在这里, '(?=...)'部分称为正向前瞻。 The match succeeds only if it can match the regex at that point, but no characters will be consumed by the engine.只有当它可以匹配该点的正则表达式时,匹配才会成功,但引擎不会消耗任何字符。

(?<=\|)\d+(?=\|)

Breaking that down:打破它:

  • (?<=\|) is a positive lookbehind that asserts that whatever is captured must be after the | (?<=\|)是一个积极的向后看,它断言捕获的任何内容都必须在|之后。 symbol象征
  • \d+ says to look for only digits. \d+表示只查找数字。 The + tells it to continue looking until it stops. +告诉它继续查找直到它停止。
  • (?<=\|) Finally a positive lookahead to tell it to be in between the pipes. (?<=\|)最后一个积极的前瞻告诉它在管道之间。

Here's some boilerplate code from regex101:这是来自 regex101 的一些样板代码:

import re

regex = r"(?<=\|)\d+(?=\|)"

test_str = "123|1*[123;abc;3;52m;|0|62|0|0|0|12|,399|;abc"

matches = re.finditer(regex, test_str, re.MULTILINE)

for matchNum, match in enumerate(matches, start=1):
    
    print("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))
    
    for groupNum in range(0, len(match.groups())):
        groupNum = groupNum + 1
        
        print("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))

Here's the output:这是 output:

Match 2 was found at 24-26: 62
Match 3 was found at 27-28: 0
Match 4 was found at 29-30: 0
Match 5 was found at 31-32: 0
Match 6 was found at 33-35: 12

With findall one pipe character "|"findall 1个pipe字"|" cannot belong to the number before and to the number after it in the same time.不能同时属于前面的数字和后面的数字。 (well, maybe with a lookahead) (好吧,也许有前瞻)

Take for example the string "|0|62|0|"以字符串"|0|62|0|"为例. . The first part "|0|"第一部分"|0|" matches the pattern and is added to the results.匹配模式并添加到结果中。 Then the pattern matching continues with the rest of the string, ie with 62|0|然后模式匹配继续使用字符串的 rest,即使用62|0| . . In this substring a second matchis found: |0|在此 substring 中找到第二个匹配项: |0| . . The middle number 62 is not found this way.以这种方式找不到中间的数字 62。

I would suggest to split the string, disregard the first and last item, because they are not between two pipe characters.我建议拆分字符串,忽略第一项和最后一项,因为它们不在两个 pipe 字符之间。 Then check the remaining items if they match "\d+" .然后检查其余项目是否匹配"\d+" You can do it with a one-liner, but here it is divided into steps:您可以使用单线来完成,但这里分为几个步骤:

s1 = "123|1*[123;abc;3;52m;|0|62|0|0|0|12|,399|;abc"
s2 = s1.split('|')
# ['123', '1*[123;abc;3;52m;', '0', '62', '0', '0', '0', '12', ',399', ';abc']
s3 = s2[1:-1]
s4 = [s for s in s3 if re.fullmatch('\d+', s)]
# ['0', '62', '0', '0', '0', '12']

i think because it skips the neighbor pattern in-between |我想是因为它跳过了中间的邻居模式| if it found the pattern.如果它找到了模式。 Here is a work around:这是一个解决方法:

def get_nums(s):
    items = s.split('|')
    found = []
    for i, item in enumerate(items):
        if i and item.strip().isdigit():
            found.append(item)
    return found 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM