查找并替换<A>标记中</a>尚未包含的文本<A>-RegEx .Net</a>

Question

我正在使用联邦注册局（.fed）中的.NET中的XML数据，其中包含对美国法规中行政命令和各章的大量引用。

我希望能够超链接到这些引用，除非它们已经在<a>标记内（该标记由XML决定，并且通常是文档本身内的链接）。

我编写的模式是匹配和删除前导和尾随字符，并且不显示它们，即使我在替换字符串中包含边界字符也是如此：

[?!<a href="#(.*)">]([0-9]{1,2})[ ]{0,1}(U\.S\.C\.|USC)[\s]{0,1}([0-9]{1,5})(\b)[^</a>]

初始XML的示例：

<p>The Regulatory Flexibility Act of 1980 (RFA), 5 U.S.C. 604(b), as amended, requires Federal agencies to consider the potential impact of regulations on small entities during rulemaking.</p>
<p>Small entities include small businesses, small not-for-profit organizations, and small governmental jurisdictions.</p>
<p>Section 605 of the RFA allows an agency to certify a rule, in lieu of preparing an analysis, if the rulemaking is not expected to have a significant economic impact on a substantial number of small entities. Reference: <a href="#1">13 USC 401</a></p>
  <ul>
      <li><em>Related laws from 14USC301-345 do not apply.</em></li>
      <li><a href="#2">14 USC 301</a> does apply.</li>
  </ul>

如您所见，某些参考文献包括美国法规部分的范围（例如14 USC 301-345）或特定子节的参考范围（例如5 USC 604（b））。 我只想链接到该范围内的第一个引用，因此链接应以-或( 。

Answer 1

如果我正确地理解了您，则我认为以下方法应该有效。

var re = new Regex(@"\d{1,2}\s?U\.?S\.?C\.?\s?\d{1,5}\b(?!</a>)");
var matches = re.Matches(text);

// matches[0].Value = 5 U.S.C. 604
// matches[1].Value = 14USC301

您甚至可以将正则表达式简化为\\d+\\s?U\\.?S\\.?C\\.?\\s?\\d+\\b(?!</a>) –我不确定2和5的上限很重要。

查找并替换<A>标记中</a>尚未包含的文本<A>-RegEx .Net</a>

问题描述

1 个解决方案

解决方案1
0 2018-02-05 03:38:11

查找并替换<A>标记中</a>尚未包含的文本<A>-RegEx .Net</a>

问题描述

1 个解决方案

解决方案1 0 2018-02-05 03:38:11

解决方案1
0 2018-02-05 03:38:11