简体   繁体   中英

How to use Multi line DOTALL with charater exception in python

I have to find multi line pattern in python. So I am using DOTALL from regex but It is finding more than what I need.

sample file:

if(condition_1)
{
....
some text
some text

if ((condition_1== condition_2)   ||
                 (condition_3== condition_4) ||
           (condition_6== condition_5)  ||
     (condition_7== condition_8)   ) // XYZ_variable
{
...

My python regex follows

re.compile(r'(if\s*?\()(.*?)(\/\/\s*?)(XYZ_variable)', re.DOTALL)

this is finding from first if conditions until XYZ_variable but I need only the second if condition where is XYZ_variable is present.

so I changed my regex as follows which is not working

re.compile(r'(if\s*?\()([^\{].*?)(\/\/\s*?)(XYZ_variable)', re.DOTALL)

My final output shall be like

if(condition_1)
    {
    ....
    some text
    some text

    if (((condition_1== condition_2)   ||
                     (condition_3== condition_4) ||
               (condition_6== condition_5)  ||
         (condition_7== condition_8)   ) || XYZ_variable )
    {
    ...

but my regex does something like this

if ((condition_1)
        {
        ....
        some text
        some text

        if ((condition_1== condition_2)   ||
                         (condition_3== condition_4) ||
                   (condition_6== condition_5)  ||
             (condition_7== condition_8)   ) || XYZ_variable )
        {
        ...

You may use

re.sub(r'(?m)^(\s*if\s*)(\(.*(?:\n(?!\s*if\s*\().*)*)//\s*(\w+)\s*$', r'\1(\2 || \3)', s)

See the regex demo .

Details

  • (?m) - re.M flag
  • ^ - start of a line
  • (\\s*if\\s*) - Group 1: if enclosed with 0+ whitespaces
  • (\\(.*(?:\\n(?!\\s*if\\s*\\().*)*) - Group 2:
    • \\( - a (
    • .* - the rest of the line
    • (?:\\n(?!\\s*if\\s*\\().*)* - 0 or more repetitions of
      • \\n(?!\\s*if\\s*\\() - a newline, LF, that is not followed with if enclosed with 0+ whitespaces and then followed with (
      • .* - the rest of the line
  • //\\s* - // and 0+ whitespaces
  • (\\w+) - Group 3: 1 or more word chars
  • \\s*$ - 0+ whitespaces and end of line.

Python demo :

import re
s = """if(condition_1)
{
....
some text
some text

if ((condition_1== condition_2)   ||
                 (condition_3== condition_4) ||
           (condition_6== condition_5)  ||
     (condition_7== condition_8)   ) // XYZ_variable
{
..."""
print( re.sub(r'(?m)^(\s*if\s*)(\(.*(?:\n(?!\s*if\s*\().*)*)//\s*(\w+)\s*$', r'\1(\2 || \3)', s) ) 

Output:

if(condition_1)
{
....
some text
some text

if (((condition_1== condition_2)   ||
                 (condition_3== condition_4) ||
           (condition_6== condition_5)  ||
     (condition_7== condition_8)   )  || XYZ_variable)
{
...

The regular expression captures the first pattern matched. That is why it always takes starting from the first if .

Consider the following minimal example, where the non-greedy ? does not modify the output:

>>> re.compile(r"if(.*?)XYZ").search("if a if b if c XYZ").group(1)
' a if b if c '

But there, the non-greedy ? does modify the output:

>>> re.compile(r"if(.*?)XYZ").search("if a XYZ if b if c XYZ").group(1)
' a '

The non-greedy ? operates only on the right side of the search.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM