I am wondering if there is a better approach than what I am currently taking to parse this file. I have a string that is in the general format of:
[Chunk of text]
--------------------
[Another chunk of text]
(There can be multiple chunks of text with the same separator between them)
I am trying to parse the chunks of text into elements of a list, which I can do with data.split('-'*20)
[in this case], however if there are not exactly 20 hyphens the split will not work as intended. I have been playing around with regex however am currently unsure of a proper regex that could be used.
Are there any better methods that I should use in this situation, or is there a regex I should use oppose to the .split() method?
I would try to use re.split()
with the regex --+
which means:
-
- one hyphen -+
- one or more hyphens ... this way it would not match a single hyphen, but everything more than one, alternatively you could use -{2,}
which means two or more.
You want a regex split. I'm not python-literate, but I found the function in the official 2.7.10 documentation , and modified to your case:
>>> re.split('\n\-{4,}\n', input)
4
is the minimum amount of dashes you want to match. \\n
are the newlines before and after. You probably don't want those in your text.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.