简体   繁体   中英

How to write a Multi-line RegEx Expression

I have a vb.net class that cleans some html before emailing the results.

Here is a sample of some html I need to remove:

    <div class="RemoveThis">
      Blah blah blah<br /> 
      Blah blah blah<br /> 
      Blah blah blah<br /> 
      <br /> 
    </div>

I am already using RegEx to do most of my work now. What would the RegEx expression look like to replace the block above with nothing?

I tried the following, but something is wrong:

'html has all of my text
html = Regex.Replace(html, "<div.*?class=""RemoveThis"">.*?</div>", "", RegexOptions.IgnoreCase)

Thanks.

Add the Singleline option:

html = Regex.Replace(html, "<div.*?class=""RemoveThis"">.*?</div>", "", RegexOptions.IgnoreCase Or RegexOptions.Singleline)

From MSDN :

Singleline: Specifies single-line mode. Changes the meaning of the dot (.) so it matches every character (instead of every character except \\n).

PS: Parsing HTML with regular expressions is discouraged . Your code will fail on something like this:

<div class="RemoveMe">
    <div>bla</div>
    <div>bla</div>
</div>

Try:

RegexOptions.IgnoreCase Or RegexOptions.Singleline

The RegexOptions.Singleline option changes the meaning of the dot from 'match anything except new line' to 'match anything'.

Also, you should consider using an HTML parser instead of regular expressions if need to parse HTML.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM