简体   繁体   中英

Python re.sub multiline on string

I try to use the flag re.MULTILINE .

I read these posts : Bug in Python Regex? (re.sub with re.MULTILINE) , Python re.sub MULTILINE caret match but it doesn't work. The code :

import re
if __name__ == '__main__':

    txt = "\n\
<?php\n\
/* Multi-line\n\
comment */\n\
$var = 1;\n"
    new_txt = re.sub(r'\/\*[.\n]*?\*\/', '', txt, flags=re.MULTILINE)
    print("\n=========== TXT ============")
    print(txt)
    print("\n=========== NEW TXT ============")
    print(new_txt)

The code output :

=========== TXT ============

<?php
/* Multi-line
comment */
$var = 1;


=========== NEW TXT ============

<?php
/* Multi-line
comment */
$var = 1;

But new_txt should not contains Multi-line comment . I want to get the txt without the Multi-line comment. Do you have any idea ?

You need to replace re.MULTILINE with re.DOTALL / re.S and move out period outside the character class as inside it, the dot matches a literal . .

Note that re.MULTILINE only redefines the behavior of ^ and $ that are forced to match at the start/end of a line rather than the whole string. The re.DOTALL flag redefines the behavior of . inside the pattern outside the character class only . It starts matching a newline symbol, too.

So, the regex you could use for the current example: /\\*.*?\\*/ . It matches a literal /* with /\\* , then .*? matches as few any symbols as possible up to and including */ (matched with \\*/ ).

See the code demo:

txt = """\n\
<?php\n\
/* Multi-line\n\
comment */\n\
$var = 1;\n"""
new_txt = re.sub(r'/\*.*?\*/', '', txt, flags=re.S)
print("\n=========== TXT ============")
print(txt)
print("\n=========== NEW TXT ============")
print(new_txt)

See IDEONE demo

However, it is not the best solution, as in most cases multiline comments are very long. The best is an unrolling-the-loop technique. The regex above can be "unrolled" like this:

/\*[^*]*(?:\*(?!/)[^*]*)*\*/

See the regex demo

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM