简体   繁体   English

在行之间匹配,同时使用正则表达式跳过模式

[英]Match between lines while skipping pattern with Regex

I've been trying to match between lines while skipping a pattern. 我一直在尝试跳过模式时在行之间进行匹配。 I'm using the re.DOTALL regex flag. 我正在使用re.DOTALL regex标志。

What i need to extract is 我需要提取的是

CHINTHAPUDI<br/>
CHINTHAPUDI<br/>

from between Electors Name and Father's Name. 在选举人姓名和父亲姓名之间。

What i have currently mustered up is this regex: 我目前聚集的是这个正则表达式:

(?:^Elector\'s Name:.*?<br/>)(.*?)^(?:Husband|Father)

But it matches the other Elector's Name lines beneath the first match. 但它会与第一个匹配项下方的其他Elector's Name行匹配。

Link to my regex101 链接到我的regex101

Here's the document from which i want to match: 这是我要匹配的文档:

Elector's Name: ANANTH CHINTAPUDI<br/>
Elector's Name: THIRUPATHI <br/>
Elector's Name: SRINIVASH <br/>
CHINTHAPUDI<br/>
CHINTHAPUDI<br/>
Father's Name: POSHANNA <br/>
Father's Name: SHANKAR <br/>
Father's Name: SHANKAR <br/>
CHINTAPUDDI<br/>
CHINTHAPUDI<br/>
CHINTHAPUDI<br/>

How could i go about matching from the last Elector's Name till Father's Name ? 从最后一个Elector's NameFather's Name我该如何匹配?

Here's an option which works for your provided input: 这是一个适用于您提供的输入的选项:

(?:Elector\\'s Name:.*?<br/>\\r?\\n)+(.*?)(?:Husband|Father)

There is one potential issue that you should consider if you use this: If an Elector's Name appears earlier in the document, the first set will be used. 如果使用此方法,则应考虑一个潜在的问题:如果Elector's Name出现在文档的前面,则将使用第一组。 See demo . 参见演示

Additionally, as your Regex attempt required that Elector's Name and Husband or Father be at the beginning of the line, here's a version which maintains that requirement. 另外,由于您的正则表达式尝试要求在行首添加Elector's NameHusbandFather ,所以这里是一个保留该要求的版本。 If possible, I would avoid this as it results in a much slower (30x) check. 如果可能的话,我会避免这种情况,因为它会导致检查(30x)慢得多。

(?:\\r?\\nElector\\'s Name:.*?<br/>)+\\r?\\n(.*?)\\r?\\n(?=Husband|Father)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM