简体   繁体   English

正则表达式在Python中匹配两个特定行之间的行

[英]Regex to match lines in-between two specific lines, in Python

I am trying to use regex to parse out some lines from text read in from a file. 我正在尝试使用正则表达式来解析从文件读取的文本中的某些行。 I know this could be done by reading in the file, line-by-line, but I like the elegance in capturing all the relevant bits of info in a single regex match. 我知道可以通过逐行读取文件来完成此操作,但是我喜欢在单个正则表达式匹配项中捕获所有相关信息的优雅方式。

The example file contents: 示例文件内容:

---
title: a title
layout: page
---

here's some text
================

this will be blog post content.

I am trying to produce a regex match that will return 2 groups: the data in-between the "---" lines, and all of the data after the 2nd "---" line. 我正在尝试生成一个正则表达式匹配项,该匹配项将返回2组:“ ---”行之间的数据,以及第二“ ---”行之后的所有数据。 Here is the regex string I have come up with, and I am having an issue with it: 这是我想出的正则表达式字符串,但我遇到了问题:

re.match('---\n(.*?)\n---\n(.*)', content, re.S)

This seems to work well, except when dealing with unix vs windows line-endings. 这似乎工作得很好,除了处理unix vs Windows行尾时。 Is there a way to allow this regex to match a \\r if it's present, too? 有没有办法允许此正则表达式匹配\\ r(如果也存在)? It works with the unix, which is just \\n I believe. 它与unix兼容, \\n我相信。

Also, if you think this regex could be improved, I'm open to suggestions. 另外,如果您认为此正则表达式可以改进,则欢迎提出建议。

行尾标记被认为是空格,因此您可以使用结构\\s+来匹配与平台无关的行尾(和其他空格)。

序列(\\r\\n|\\r|\\n)将匹配所有“普通”行的结尾(分别为Windows,旧Mac和* nix)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM