I want to split a body of text if there is a line which contains only "----". I am using the re.split(..)
method but it's not behaving as expected. What am I missing?
import re
s = """width:5
----
This is a test sentence to test the width thing"""
print re.split('^----$', s)
this simply prints
['width:5\n----\nThis is a test scentence to test the width thing']
You are missing the MULTILINE
flag :
print re.split(r'^----$', s, flags=re.MULTILINE)
Without it ^
and $
were applied to the whole s
string, not to the every line in the string:
re.MULTILINE
When specified, the pattern character '^' matches at the beginning of the string and at the beginning of each line (immediately following each newline); and the pattern character '$' matches at the end of the string and at the end of each line (immediately preceding each newline).
Demo:
>>> import re
>>>
>>> s = """width:5
... ----
... This is a test sentence to test the width thing"""
>>>
>>> print re.split(r'^----$', s, flags=re.MULTILINE)
['width:5\n', '\nThis is a test sentence to test the width thing']
Also you can dont use ^
and $
because that with ^
and $
you specify that regex engine match from first to end of string , and use Positive look-around to keep \\n
:
>>> print re.split('(?<=\n)----(?=\n)', s)
['width:5\n', '\nThis is a test sentence to test the width thing']
不使用正则表达式进行拆分的另一种方法。
s.split("\n----\n")
less code make it perfect as expected:
IN:
re.split('[\n-]+', s, re.MULTILINE)
OUT:
['width:5', 'This is a test sentence to test the width thing']
你试过了吗:
result = re.split("^----$", subject_text, 0, re.MULTILINE)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.