简体   繁体   中英

Regular expression multi-line replacement in Python

I would like to search and replace a block of text which contains new line characters.

In the example below when the DOTALL flag is specified, findall behaves as expected and '.' matches any character including a newline. But when calling sub, the DOTALL flag doesn't seem to do anything and no matches are found. I just want to confirm that I can't use '.' with sub to replace text that contains new line characters or if I'm not calling the function correctly.

Code

import re
text = """
some example text...
START
bla bla
bla bla
END
"""
print 'this works:', re.findall('START.*END', text, re.DOTALL)
print 'this fails:', re.sub('START.*END', 'NEWTEXT', text, re.DOTALL)

Output

this works: ['START\nbla bla\nbla bla\nEND']
this fails:
some example text...
START
bla bla
bla bla
END

I'm not exactly sure why, but you have to specify flags= in re.sub (the docs uses it).

print 'this works:', re.sub('START.*END', 'NEWTEXT', text, flags=re.DOTALL)

It might be because of the optional count argument.

EDIT:

I think that's because of the count argument after all, since this works as well:

print 'this works:', re.sub('START.*END', 'NEWTEXT', text, 0, re.DOTALL)

0 meaning replacing all.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM