[英]Regex to find missing space after html tags
從設置超過10000行的文本,我需要找到字符串的所有實例,其中缺少一組html標簽后的空格。 HTML標記集是有限的,如下所示。
<b> </b>, <em> </em>, <span style="text-decoration: underline;" data-mce-style="text-decoration: underline;"> </span> <sub> </sub>, <sup> </sup>, <ul> </ul>, <li> </li>, <ol> </ol>
運行Regx后跟着字符串應該進入結果。
Hi <b>all</b>good morning.
在這種情況下,我們在粗體標記后錯過了sapce。
假設C#:
StringCollection resultList = new StringCollection();
Regex regexObj = new Regex("^.*<(?:/?b|/?em|/?su[pb]|/?[ou]l|/?li|span style=\"text-decoration: underline;\" data-mce-style=\"text-decoration: underline;\"|/span)>(?! ).*$", RegexOptions.Multiline);
Match matchResult = regexObj.Match(subjectString);
while (matchResult.Success) {
resultList.Add(matchResult.Value);
matchResult = matchResult.NextMatch();
}
將返回文件中所有行,其中列表中的一個標記后面至少有一個空格。
輸入:
This </b> is <b> OK
This <b> is </b>not OK
Neither <b>is </b> this.
輸出:
This <b> is </b>not OK
Neither <b>is </b> this.
說明:
^ # Start of line
.* # Match any number of characters except newlines
< # Match a <
(?: # Either match a...
/?b # b or /b
| # or
/?em # em or /em
|... # etc. etc.
) # End of alternation
> # Match a >
(?! ) # Assert that no space follows
.* # Match any number of characters until...
$ # End of line
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.