简体   繁体   English

单个正则表达式检索多个“内部”文本(或子字符串)

[英]Single Regex to Retrieve Multiple "inner" Text (or Substrings)

I'm looking for an elegant way to handle regex of content where there are multiple "headers" and then multiple repeated "inner keys".我正在寻找一种优雅的方式来处理内容的正则表达式,其中有多个“标题”,然后是多个重复的“内部键”。 I'm wondering if this is possible or if what I'm doing (ie pull out each group and then process each inner data set) is the best method.我想知道这是否可行,或者我正在做的事情(即拉出每个组,然后处理每个内部数据集)是否是最好的方法。

I've been able to do the "brute force" (aka two regex) method where I do a regex pass to get the header content and block of key data;我已经能够执行“蛮力”(又名两个正则表达式)方法,我执行正则表达式传递以获取 header 内容和关键数据块; then do a second pass on each of the key data blocks with a second regex to pull out those details;然后用第二个正则表达式对每个关键数据块进行第二次传递以提取这些细节; but I'm wondering if there is a way to do a single regex that would have one match for each key, but also include the header data?但我想知道是否有一种方法可以做一个单一的正则表达式,每个键都有一个匹配,但也包括 header 数据?

var headerRegex = new Regex(@"HEADER.+?name = (?<name>[\w\s]+?)detail(?<keyData>.+?)HEADER", RegexOptions.Singleline);
var keyRegex = new Regex(@"KEY.+?name = (?<name>[\w\s]+?)type", RegexOptions.Singleline);
foreach (Match headerMatch in headerRegex.Matches(input))
{
   foreach (Match keyMatch in keyRegex(headerMatch.Groups["keyData"].Value))
   {
     // here I have header and key data
     // goal: to have a single foreach with a regex that has head and key data
   }
}

The content I'm working with has the form (note: it is not JSON formatted, more "json-esque")我正在处理的内容具有以下形式(注意:它不是 JSON 格式,更像是“json-esque”)

HEADER
{
  name = name content
  detail = detail content
  sample = sample content
  KEY
  {
    name = name content
    type = type content
    value = value content
  }
  KEY
  {
    name = name content
  }
  KEY
  {
    name = name content
  }
  additional = additional content
  more = more content
}
HEADER
{
  name = name content
  detail = detail content
  sample = sample content
  KEY
  {
    name = name content
    type = type content
    value = value content
  }
  KEY
  {
    name = name content
  }
  KEY
  {
    name = name content
  }
  additional = additional content
  more = more content
}

Based on your current regex, looks like you intend to match only the values for name keys.根据您当前的正则表达式,看起来您打算只匹配name键的值。

Option 1: Simplest option would be https://regex101.com/r/jRRwsv/1 .选项 1:最简单的选项是https://regex101.com/r/jRRwsv/1

Pattern: (?:HEADER|KEY).+?name = (?<name>[\w\s]+?)\n模式:( (?:HEADER|KEY).+?name = (?<name>[\w\s]+?)\n

Explanation: String should either start with HEADER or KEY , followed by name .说明:字符串应以HEADERKEY开头,后跟name This will not differentiate between the name values in HEADER and KEY section.这不会区分 HEADER 和 KEY 部分中的名称值。


Option 2: To capture HEADER and KEY names separately, one option is to split the expand the regex as (?:HEADER.+?name = (?<hname>[\w\s]+?)\n)|(?:KEY.+?name = (?<kname>[\w\s]+?)\n) .选项 2:要分别捕获 HEADER 和 KEY 名称,一种选择是将正则表达式展开为(?:HEADER.+?name = (?<hname>[\w\s]+?)\n)|(?:KEY.+?name = (?<kname>[\w\s]+?)\n)

Refer https://regex101.com/r/jRRwsv/2参考https://regex101.com/r/jRRwsv/2


声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM