简体   繁体   中英

Regex: stripping multi-line comments but maintaining a line break & Single line comments at start of new line

The input next fle is as follows

int 1; //integer
//float 1; //floating point number
int m; //integer
/*if a==b
begin*/
print 23 /* 1, 2, 3*/
end
float/* ty;
int yu;*/

Expected output is as follows

int 1; //integer
int m; //integer
print 23 
end
float

Here is a two step replacement which seems to work:

inp = """int 1; //integer
//float 1; //floating point number
int m; //integer
/*if a==b
begin*/
print 23 /* 1, 2, 3*/
end
float/* ty;
int yu;*/"""

output = re.sub(r'^\s*//.*?\n', '', inp, flags=re.M)
output = re.sub(r'\n?/\*.*?\*/(\n?)', r'\1', output, flags=re.M|re.S)
print(output)

This prints:

int 1; //integer
int m; //integer
print 23 
end
float

The first call to re.sub removes all lines which start with a // comment. The second call to re.sub removes the C-style /* */ comments. It works by trying to match a newline both before and after the comment itself. Then, it replaces with as much as only a single newline, assuming one followed the comment.

You can convert matches of the following to empty strings.

\/\/.*\r?\n|\/\/.*|^\/\*[\s\S]*?\*\/\r?\n|\/\*[\s\S]*?\*\/

Demo

Note the second alternation element must follow the first and the fourth alternation element must follow the third.

The regular expression can be broken down as follows.

(?m)       # set multiline flag 
  ^\/\/    # match '//' at beginning of line
  .*\r?\n  # match 0+ chars other than line
           # terminators then match line terminator
|          # or
  \/\/.*   # match '//'
  .*       # match the remainder of the line
|          # or
  ^\/\*    # match '/*' at the beginning of a line
  [\s\S]*? # match 0+ characters including line
           # terminators, lazily
  \*\/     # match '*/'
  \r?\n    # match line terminators
|          # or
  \*\/     # match '*/'
  [\s\S]*? # match 0+ characters including line
           # terminators, lazily
  \*\/     # match '*/'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM