[英]Regx Match Expression to find by exlcuding html tags
var TextToFind = "The Term is Fixed";
var TexToSearch = "<head>The</head> Term is Fixed";
現在使用的表達式是(?mi)\\bThe Term is Fixed\\b
。
我們如何修改這個現有的表達模式來找到帶有標簽的文本?
您可以按如下方式執行此操作
string str = "< head > The </ head > Term is Fixed";
string textWithoutTags = Regex.Replace(str, "<[^>]*>", string.Empty);
要匹配所有帶有標簽或單詞之間的空格的子字符串,您可以動態構建一個正則表達式,如
The(?>\s*<[^>]*>\s*|\s+)Term(?>\s*<[^>]*>\s*|\s+)is(?>\s*<[^>]*>\s*|\s+)Fixed
其中每個空格都替換為(?>\\s*<[^>]*>\\s*|\\s+)
匹配的模式
\\s*<[^>]*>\\s*
- <
,然后是除<
和>
之外的 0 個或多個字符,然后是>
,用 0 個或多個空格括起來|
- 或者\\s+
- 1 個或多個空格。查看正則表達式演示
請參閱C# 演示:
var TextToFind = "The Term is Fixed";
var TexToSearch = "<head>The</head> Term is Fixed\n<head>The</head> Term <span>is</span> Fixed";
var regex = string.Join(@"(?>\s*<[^>]*>\s*|\s+)", TextToFind.Split());
var result = Regex.Matches(TexToSearch, regex).Cast<Match>().Select(x => x.Value);
foreach (var s in result)
Console.WriteLine(s);
輸出:
The</head> Term is Fixed
The</head> Term <span>is</span> Fixed
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.