简体   繁体   中英

Regex matching a multiline, mixed case string with whitespace in C#

I'm trying to guarantee that a field in our CMS contains only an unordered list. For example,

<ul>
    <li>
        This is our first bullet point
    </li>
</ul>

I'm using the following to match this:

String pattern = "^<ul>(<li>.*</li>)+</ul>$";
Regex rgx = new Regex(@pattern, 
    RegexOptions.IgnorePatternWhitespace 
    | RegexOptions.Multiline 
    | RegexOptions.IgnoreCase);
if(rgx.IsMatch(controlValidationValue)) { ... }

This works when the html is all on one line, but fails as soon as I get line breaks or whitespace - which may happen, as our CMS uses a rich text plugin to create the html.

I've tried using a bitwise AND (instead of OR), and played with RegexOptions.SingleLine but can't get to the bottom of the problem.

Any/all help appreciated!

In general i would use HtmlAgilityPack to parse HTML instead of regex.

string html = @"<ul>
    <li>
        This is our first bullet point
    </li>
</ul>";

var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html.Trim());  // Trim to remove leading or trailing spaces if that's possible
bool valid = doc.DocumentNode.ChildNodes.Count == 1 
          && doc.DocumentNode.ChildNodes[0].Name == "ul";

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM