簡體   English   中英

捕獲開始但不捕獲結束標記

[英]Capture opening but not the closing tag

我想分割父塊,同時沿着每個段的文本捕獲嵌套標簽:

(?<tag>.)(?: href="(?<url>.+?)")?>(?<text>.+?)<

它有效,但我希望“標簽”在未包裝在標簽中的段中為空,但是對於當前的注冊,這些捕獲了前一個段的結束標簽...:(

實時示例: https : //regex101.com/r/UEZAaw/3/

我想獲得的結果集,請注意,第 2 項和第 4 項的標簽應該為null

{
   "0":{
      match: "p>The <",
      tag: "p",
      url: null,
      text: "The "
   },
   "1":[
      match: "a href=\"https://www.legislation.gov.uk/ukpga/2010/23/contents\">UK Bribery Act<",
      tag: "a",
      url: "https://www.legislation.gov.uk/ukpga/2010/23/contents",
      text: "UK Bribery Act"
   ],
   "2":[
      match: "/a> (“the Act”) received Royal Assent in April 2010 and came into ... <",
      tag: null
      url: null,
      text: " (“the Act”) received Royal Assent in April 2010 and came into ... "
   ],
   "3":[
      match: "a href=\"http://www.oecd.org/daf/anti-bribery/ConvCombatBribery_ENG.pdf\">OECD anti-bribery Convention<",
      tag: "a",
      url: "http://www.oecd.org/daf/anti-bribery/ConvCombatBribery_ENG.pdf",
      text: "OECD anti-bribery Convention"
   ],
   "4":[
      match: "/a>. The Act outlined four prime offences, including the introduction ... <",
      tag: null,
      url: null,
      text: ". The Act outlined four prime offences, including the introduction ... "
   ],
   "5":[
      match: "b>rest is history<",
      tag: "b",
      url: null,
      text: "rest is history"
   ]
   ...
}

花了幾個小時,還沒有弄清楚,非常感謝您的建議。

我認為這是有效的,基於我在regex101MATCH INFORMATION框中看到的內容

/(?:(?<tag>(?<!\/).)|(?:\/.))(?: href="(?<url>.+?)")?>(?<text>.+?)</gm

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM