简体   繁体   中英

Split a string in Python by a specific line in text

I want to split a body of text if there is a line which contains only "----". I am using the re.split(..) method but it's not behaving as expected. What am I missing?

import re

s = """width:5
----
This is a test sentence to test the width thing"""

print re.split('^----$', s)

this simply prints

['width:5\n----\nThis is a test scentence to test the width thing']

You are missing the MULTILINE flag :

print re.split(r'^----$', s, flags=re.MULTILINE)

Without it ^ and $ were applied to the whole s string, not to the every line in the string:

re.MULTILINE

When specified, the pattern character '^' matches at the beginning of the string and at the beginning of each line (immediately following each newline); and the pattern character '$' matches at the end of the string and at the end of each line (immediately preceding each newline).

Demo:

>>> import re
>>> 
>>> s = """width:5
... ----
... This is a test sentence to test the width thing"""
>>> 
>>> print re.split(r'^----$', s, flags=re.MULTILINE)
['width:5\n', '\nThis is a test sentence to test the width thing']

Also you can dont use ^ and $ because that with ^ and $ you specify that regex engine match from first to end of string , and use Positive look-around to keep \\n :

>>> print re.split('(?<=\n)----(?=\n)', s)
['width:5\n', '\nThis is a test sentence to test the width thing']

不使用正则表达式进行拆分的另一种方法。

s.split("\n----\n")

less code make it perfect as expected:

IN:

re.split('[\n-]+', s, re.MULTILINE)

OUT:

['width:5', 'This is a test sentence to test the width thing']

你试过了吗:

result = re.split("^----$", subject_text, 0, re.MULTILINE)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM