简体   繁体   English

在C#中使用正则表达式突出显示html中的单词

[英]highlight words in html using regex in C#

I found this article on stackoverflow 我在stackoverflow上找到了这篇文章

highlight words in html using regex & javascript - almost there 使用正则表达式和JavaScript突出显示html中的单词-几乎存在

Using the article above, I am trying to highlight HTML text on the server using c#. 通过上面的文章,我试图使用c#在服务器上突出显示HTML文本。 The code is shown below: 代码如下所示:

string replacePattern = "$1<span style=\"background-color:yellow\">$2</span>";
string searchPattern = String.Format("(?<=^|>)(.*?)({0})(?=.*?<|$)", searchString.Trim());
content = Regex.Replace(content, searchPattern, replacePattern, RegexOptions.IgnoreCase);

The code seems to work great except when trying to highlight a word that is contained in an image source: 除了试图突出显示图像源中包含的单词时,该代码似乎运行良好:

Search Keyword: 搜索关键字:

ABC

Search Text: 搜索文字:

<div><img src="/site/folder/ABC.PNG" /><br />ABC</div>

The result will highlight both the text and the image name. 结果将突出显示文本和图像名称。

Any help would be greatly appreciated. 任何帮助将不胜感激。

I'll offer up a solution, but I agree that solely using Regex for parsing HTML can eventually not be worth the effort. 我将提供一个解决方案,但是我同意仅使用Regex来解析HTML最终是不值得的。 That said, you know more about your problem space than the rest of us, so if the HTML you're highlighting is under your control you may be able to test enough of your domain to achieve what you want with regexes. 就是说,与我们其他人相比,您对问题空间的了解更多,因此,如果要突出显示的HTML在您的控制之下,则您可能能够测试您的域中的足够多的内容,以使用正则表达式来实现所需的功能。

My solution changes the regex you've supplied to take this approach: 我的解决方案更改了您提供的正则表达式以采用这种方法:

  1. Match and capture into $1 the > char, non-greedy capture chars not in set [<>] 将>字符,不在集合[<>]中的非贪婪捕获字符匹配并捕获到$ 1中
  2. Match and capture your keyword into $2 将关键字匹配并捕获到$ 2中
  3. Match and capture into $3 non-greedy chars not in set [<>], plus the < char 匹配并捕获到未设置为[<>]的$ 3个非贪心字符中,并加上<char

Caveats: 注意事项:

  1. well-formed HTML works best, if this html is User-Generated content (UGC), then, good luck you should've used an HTML parser :) 格式正确的HTML效果最好,如果此html是用户生成的内容(UGC),那么好运,您应该使用HTML解析器:)
  2. this would highlight content within <textarea>...</textarea> 这将突出显示<textarea>...</textarea>
  3. this would highlight content within <script>...</script> 这会突出显示<script>...</script>

Note you could expand the capture on the lefthand side to capture the tag name and conditionally not replace for a set of tags like textarea and script. 请注意,您可以在左侧扩展捕获以捕获标签名称,并且有条件地不替换诸如textarea和script的一组标签。

string searchString = "ABC";
string content = "<div><img src='/site/folder/ABC.PNG' /><br />ABC</div>";
string replacePattern = "$1<span style=\"background-color:yellow\">$2</span>$3";
string searchPattern = String.Format("(>[^<>]*?)({0})([^<>]*?<)", searchString.Trim());
content = Regex.Replace(content, searchPattern, replacePattern, RegexOptions.IgnoreCase);
Console.WriteLine(content);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM