简体   繁体   English

Regx 匹配表达式通过排除 html 标签来查找

[英]Regx Match Expression to find by exlcuding html tags

var TextToFind  = "The Term is Fixed";
var TexToSearch = "<head>The</head> Term is Fixed";

The expression used now is (?mi)\\bThe Term is Fixed\\b .现在使用的表达式是(?mi)\\bThe Term is Fixed\\b

How can we modify this existing expression pattern to find the text with tags?我们如何修改这个现有的表达模式来找到带有标签的文本?

You can do this as follows您可以按如下方式执行此操作

string str = "< head > The </ head > Term is Fixed";
string textWithoutTags = Regex.Replace(str, "<[^>]*>", string.Empty);

To match all the substrings you have with tags or whitespace between the words, you may dynamically construct a regex like要匹配所有带有标签或单词之间的空格的子字符串,您可以动态构建一个正则表达式,如

The(?>\s*<[^>]*>\s*|\s+)Term(?>\s*<[^>]*>\s*|\s+)is(?>\s*<[^>]*>\s*|\s+)Fixed

where each space is replaced with (?>\\s*<[^>]*>\\s*|\\s+) pattern that matches either其中每个空格都替换为(?>\\s*<[^>]*>\\s*|\\s+)匹配的模式

  • \\s*<[^>]*>\\s* - < , then 0 or more chars other than < and > and then > , enclosed with 0 or more whitespaces \\s*<[^>]*>\\s* - < ,然后是除<>之外的 0 个或多个字符,然后是> ,用 0 个或多个空格括起来
  • | - or - 或者
  • \\s+ - 1 or more whitespaces. \\s+ - 1 个或多个空格。

See the regex demo查看正则表达式演示

See the C# demo :请参阅C# 演示

var TextToFind  = "The Term is Fixed";
var TexToSearch = "<head>The</head> Term is Fixed\n<head>The</head> Term <span>is</span> Fixed";
var regex = string.Join(@"(?>\s*<[^>]*>\s*|\s+)", TextToFind.Split());
var result = Regex.Matches(TexToSearch, regex).Cast<Match>().Select(x => x.Value);
foreach (var s in result)
    Console.WriteLine(s);

Output:输出:

The</head> Term is Fixed
The</head> Term <span>is</span> Fixed

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM