简体   繁体   中英

Separate text from hyphens line

I am wondering if there is a better approach than what I am currently taking to parse this file. I have a string that is in the general format of:

[Chunk of text]
--------------------
[Another chunk of text]

(There can be multiple chunks of text with the same separator between them)

I am trying to parse the chunks of text into elements of a list, which I can do with data.split('-'*20) [in this case], however if there are not exactly 20 hyphens the split will not work as intended. I have been playing around with regex however am currently unsure of a proper regex that could be used.

Are there any better methods that I should use in this situation, or is there a regex I should use oppose to the .split() method?

I would try to use re.split() with the regex --+ which means:

  1. - - one hyphen
  2. -+ - one or more hyphens

... this way it would not match a single hyphen, but everything more than one, alternatively you could use -{2,} which means two or more.

You want a regex split. I'm not python-literate, but I found the function in the official 2.7.10 documentation , and modified to your case:

>>> re.split('\n\-{4,}\n', input)
  • 4 is the minimum amount of dashes you want to match.
  • \\n are the newlines before and after. You probably don't want those in your text.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM