捕获开始但不捕获结束标记

Question

我想分割父块，同时沿着每个段的文本捕获嵌套标签：

(?<tag>.)(?: href="(?<url>.+?)")?>(?<text>.+?)<

它有效，但我希望“标签”在未包装在标签中的段中为空，但是对于当前的注册，这些捕获了前一个段的结束标签...:(

实时示例： https : //regex101.com/r/UEZAaw/3/

我想获得的结果集，请注意，第 2 项和第 4 项的标签应该为null ：

{
   "0":{
      match: "p>The <",
      tag: "p",
      url: null,
      text: "The "
   },
   "1":[
      match: "a href=\"https://www.legislation.gov.uk/ukpga/2010/23/contents\">UK Bribery Act<",
      tag: "a",
      url: "https://www.legislation.gov.uk/ukpga/2010/23/contents",
      text: "UK Bribery Act"
   ],
   "2":[
      match: "/a> (“the Act”) received Royal Assent in April 2010 and came into ... <",
      tag: null
      url: null,
      text: " (“the Act”) received Royal Assent in April 2010 and came into ... "
   ],
   "3":[
      match: "a href=\"http://www.oecd.org/daf/anti-bribery/ConvCombatBribery_ENG.pdf\">OECD anti-bribery Convention<",
      tag: "a",
      url: "http://www.oecd.org/daf/anti-bribery/ConvCombatBribery_ENG.pdf",
      text: "OECD anti-bribery Convention"
   ],
   "4":[
      match: "/a>. The Act outlined four prime offences, including the introduction ... <",
      tag: null,
      url: null,
      text: ". The Act outlined four prime offences, including the introduction ... "
   ],
   "5":[
      match: "b>rest is history<",
      tag: "b",
      url: null,
      text: "rest is history"
   ]
   ...
}

花了几个小时，还没有弄清楚，非常感谢您的建议。

Answer 1

我认为这是有效的，基于我在regex101的MATCH INFORMATION框中看到的内容：

/(?:(?<tag>(?<!\/).)|(?:\/.))(?: href="(?<url>.+?)")?>(?<text>.+?)</gm

捕获开始但不捕获结束标记

问题描述

1 个解决方案

解决方案1
2 已采纳 2020-02-15 16:41:39

捕获开始但不捕获结束标记

问题描述

1 个解决方案

解决方案1 2 已采纳 2020-02-15 16:41:39

解决方案1
2 已采纳 2020-02-15 16:41:39