简体   繁体   中英

How to exclude comment lines when searching with regular expression?

Need to exclude blocks that are located with a regular expression when preceded with # and any number of spaces. Here is a example file

&START   A=23  ... more data ...
                  B=24    &END
#   &START   A=34  ... more data ...
                  B=24    &END
&START   .... block 3 of data across multiple lines ....  &END
&START   .... block 4 of data across multiple lines ....  &END

The following regular expression does not exclude the commented entry as I expected -

(?!#\s*)&START(.+?)&END 

The desire is to walk through the entries and the file for processing. Python code to do this (which works well other than comment lines making it through) -

f=open(filename)
data=f.read()
f.close()

pattern=re.compiler(r'(?!#\s*)&START(.+?)&END, re.DOTALL)
get_entries = pattern.findall

for entry in get_entries(data):
    # process the entry
    print entry

Likely a basic oversight as I am green when it comes to regular expressions. Many thanks for anyone who can make a suggestion.

Skip the line altogether.

if line.lstrip().startswith('#'):
  continue

This seems to work:

import re

target="""
&START   A=23  ... more data ...
                  B=24    &END
#   &START   A=C34  ... more data ...
                  B=C24    &END
&START   .... block 3 of data across multiple lines ....  &END
&START   .... block 4 of data across multiple lines ....  &END
"""

regex = re.compile("^(?!#)&START (.*?)&END",re.MULTILINE|re.DOTALL)

for s in regex.findall(target):
   print s

Returns:

  A=23  ... more data ...
                  B=24    
  .... block 3 of data across multiple lines ....  
  .... block 4 of data across multiple lines ....  

This is best worked into a generator. Using the (m) multiline tag will allow it to search the next line till it finds your end tag.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM