[英]Regx Match Expression to find by exlcuding html tags
var TextToFind = "The Term is Fixed";
var TexToSearch = "<head>The</head> Term is Fixed";
The expression used now is (?mi)\\bThe Term is Fixed\\b
.现在使用的表达式是
(?mi)\\bThe Term is Fixed\\b
。
How can we modify this existing expression pattern to find the text with tags?我们如何修改这个现有的表达模式来找到带有标签的文本?
You can do this as follows您可以按如下方式执行此操作
string str = "< head > The </ head > Term is Fixed";
string textWithoutTags = Regex.Replace(str, "<[^>]*>", string.Empty);
To match all the substrings you have with tags or whitespace between the words, you may dynamically construct a regex like要匹配所有带有标签或单词之间的空格的子字符串,您可以动态构建一个正则表达式,如
The(?>\s*<[^>]*>\s*|\s+)Term(?>\s*<[^>]*>\s*|\s+)is(?>\s*<[^>]*>\s*|\s+)Fixed
where each space is replaced with (?>\\s*<[^>]*>\\s*|\\s+)
pattern that matches either其中每个空格都替换为
(?>\\s*<[^>]*>\\s*|\\s+)
匹配的模式
\\s*<[^>]*>\\s*
- <
, then 0 or more chars other than <
and >
and then >
, enclosed with 0 or more whitespaces \\s*<[^>]*>\\s*
- <
,然后是除<
和>
之外的 0 个或多个字符,然后是>
,用 0 个或多个空格括起来|
- or \\s+
- 1 or more whitespaces. \\s+
- 1 个或多个空格。 See the regex demo查看正则表达式演示
var TextToFind = "The Term is Fixed";
var TexToSearch = "<head>The</head> Term is Fixed\n<head>The</head> Term <span>is</span> Fixed";
var regex = string.Join(@"(?>\s*<[^>]*>\s*|\s+)", TextToFind.Split());
var result = Regex.Matches(TexToSearch, regex).Cast<Match>().Select(x => x.Value);
foreach (var s in result)
Console.WriteLine(s);
Output:输出:
The</head> Term is Fixed
The</head> Term <span>is</span> Fixed
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.