简体   繁体   English

正则表达式查找条目的第二次出现

[英]Regex to find the second occurance of an entry

i am no expert in regex. 我不是正则表达式专家。 Therefore, my skill set is beaten. 因此,我的技能被击败了。 Consider the following text: 考虑以下文本:

[SectionTitle0]
...
Name: NameOfTechC
...

[SectionTitle1]
...
Name: NameOfZoneC
...

I am interested in extracting the name of Tech-C and Zone-C using regex. 我对使用正则表达式提取Tech-C和Zone-C的名称感兴趣。 This looks like a config-section-party, though I might use a library to parse configs. 尽管我可能使用库来解析配置,但这看起来像config-section-party。 But this extract is part of a even bigger file. 但是,此摘录是更大文件的一部分。 In consequence, config-parsers does not work here. 因此,配置解析器在这里不起作用。

Currently, I extract the name with Name:\\s?(.+) . 目前,我使用Name:\\s?(.+)提取名称。 Using re.findall in python returns a list containing both names. 在python中使用re.findall返回包含两个名称的列表。 Is there a way to use something like 有没有办法使用类似

TechC_name: regex1
ZoneC_name: regex2

that returns the list for either the Tech-C name or the Zone-C name? 返回Tech-C名称或Zone-C名称的列表?

[Update] [更新]
I want to clarify some points. 我想澄清一些要点。 The position of 'Name:' is not fixed, therefore it is possible that same points are listed before, and same after, the entry. “名称:”的位置不固定,因此可能在条目之前和之后列出相同的点。 I updated my question. 我更新了我的问题。

I recognised that sometimes SectionTitle0 (former 'Tech-C') and SectionTitle1 (former 'Zone-C') are identically. 我认识到,有时SectionTitle0(以前为“ Tech-C”)和SectionTitle1(以前为“ Zone-C”)是相同的。 That makes it a little bit more complicated. 这使其变得更加复杂。 May be there is a way to build a regex that fits the first occurrence of 'Name:' and a regex matching the second (or n-th) occurrence of 'Name:'. 可能有一种方法可以构建适合第一次出现的“名称:”的正则表达式,以及匹配第二次(或第n个)出现的“名称:”的正则表达式。

The two regexes you are looking for are: 您正在寻找的两个正则表达式是:

TechC_name: TechC_name:

re.findall(r"\[Tech-C\]\nName: (.*?)\n", s)

ZoneC_name: ZoneC_name:

re.findall(r"\[Zone-C\]\nName: (.*?)\n", s)

You can get the output in following format easily:- 您可以轻松获得以下格式的输出:

[(section1, name1), (section2, name2), ...]

By following regex implementation:- 通过以下正则表达式实现:

import re
re.findall(r"\[(\S+)\]\nName: (\w+)", t)

Output will be:- 输出将是:-

[('Tech-C', 'NameOfTechC'), ('Zone-C', 'NameOfZoneC')]

Regex: 正则表达式:

\[([^\]]*)\][\r\n]+(?:(?!Name:).*[\r\n]+)*?Name:\s*(.*)

Efficiently captures section title and name into groups \\1 and \\2 . 有效地将节标题和名称捕获到\\1\\2

Implementation 实作

import re

data  = """[SectionTitle0]
...
Name: NameOfTechC
...

[SectionTitle1]
...
Name: NameOfZoneC
...
"""

regexStr = r'\[([^\]]*)\][\r\n]+(?:(?!Name:).*[\r\n]+)*?Name:\s*(.*)'
regex    = re.compile(regexStr)
regex.findall(data)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM