简体   繁体   中英

conditional regex on multiline string in python

This question is similar to my original post.

Unable to use conditional regex to test my string in python

The reason for posting another new question is that the requirement here is a little different than the original one.

If the given string is a line by line based, the original answer is good enough. But, the answer there cannot cover the case on multiline string. See below

Test case Test string Expect value from bool(re.match(...))
1. Naive match
xxxx 
xxxx
board add 0/1 aaa
board add 0/2 aaa
board add 0/3 bbb
board add 0/4 bbb
board add 0/5 aaa
#
True
2. Bad model name
xxxx 
xxxx
board add 0/1 xxx
board add 0/2 aaa
board add 0/3 bbb
board add 0/4 aaa
board add 0/5 bbb
#
False
3. Missing model
 xxxx 
xxxx
board add 0/2 aaa
board add 0/3 bbb
board add 0/4 bbb
board add 0/5 aaa
#
True

I try multiple regex. But, all of them fail on either test case (2) / (3).

Tried Regex Failed on Test
(board add 0/1)? (?(1) (aaa|bbb)) 2
^(?:(?.board add 0/1)?)*$|board add 0/1 (::aaa|bbb) 2
board add 0/1 (aaa|bbb) 3
(?=board add 0/1 )(?:board add 0/1 (aaa|bbb)) 3

Is it possible to write a regex for getting above test case pass?

You can check them on following url

https://regex101.com/r/2l2Qd4/1

NOTE:

  • I just want to catch a particular board add 0/1 instead of board add 0/\d+
    • In my actual use case, interfaces may need different models. That's why I am trying to figure out a particular regex for board add 0/1 . Then, I can extend the regex to board add 0/2 to board add 0/21 one by one
  • Requirements of a valid string
    • If board add 0/1 exists in the string, it must be followed by (aaa|bbb) . Otherwise, it is invalid
    • If board add 0/1 does not exists in the string, this is a valid string.

In that case, you can use this regex

board add 0/\d+ (?!aaa|bbb)

If the regex matches then the string is invalid.

Python Example

import re


strings = [
    """xxxx
xxxx
 board add 0/1 aaa
 board add 0/2 aaa
 board add 0/3 bbb
 board add 0/4 bbb
 board add 0/5 aaa
#""",
    """xxxx
xxxx
 board add 0/1 xxx
 board add 0/2 aaa
 board add 0/3 bbb
 board add 0/4 aaa
 board add 0/5 bbb
#""",
    """xxxx
xxxx
 board add 0/2 aaa
 board add 0/3 bbb
 board add 0/4 bbb
 board add 0/5 aaa
#"""
]

for string in strings:
    print(not bool(re.search(r"board add 0/\d+ (?!aaa|bbb)", string)))

Output

True
False
True

Explanation

re.search returns the matched chunk of the string by the given pattern. If any matching does not exist returns None . The solution is based on negating the valid strings. So if neither aaa nor bbb is followed after board add 0/1 then the string is invalid. The rest are passed as you described in your previous question . So, if the re.search returns any value but None , then the not bool(...) will convert the value to the expected result.

NOTE: I'm using not bool(...) as the string is valid if it does not contain the pattern.

We can just focus on board add 0/1 and ignoring other board add 0/x in this question. In fact, despite the negation, your current solution fits my need. I am just wondering why we need negation, and why my answer does not work.

The first (board add 0/1)? (?(1) (aaa|bbb)) (board add 0/1)? (?(1) (aaa|bbb)) I cannot understand what did you expect to match. The second regex is similar to my answer to your previous question. The third one is more close to the answer.

I changed the regex I was suggesting to your previous question.

^(?:(?!board add 0\/1).)*$|^.*?board add 0\/1 (?:aaa|bbb).*$

Now you can use re.match instead of re.search

...

for string in strings:
    print(bool(re.match(r"^(?:(?!board add 0\/1).)*$|^.*?board add 0\/1 (?:aaa|bbb).*$", string, re.S)))

Output

True
False
True

NOTE: There also used the re.S (singleline) flag.

You seem to want to match all board lines ending with either aaa or bbb, and being indented between strings that start the line with a non whitespace character.

To prevent partial matches, you would need to identify the part before and after the repeating board part.

^\S.*(?:\n[^\S\n]+board add 0/\d+ (?:aaa|bbb))+\n\S

Explanation

  • ^ Start of string
  • \S.* Match a non whitespace char and the rest of the line
  • (?: Non capture group to repeat as a whole part
    • \n[^\S\n]+ Match a newline followed by 1+ spaces
    • board add 0/\d+ (?:aaa|bbb) Match the board pattern where \d+ matches 1+ digits
  • )+ Close the non capture group and repeat 1+ times to match at least a single line
  • \n\S Match a newline and a non whitespace char

Regex demo

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM