简体   繁体   中英

Replace Html inside Pre tag using Regex

How can I replace Html inside pre tag? I would prefer to do that with Regex

<html>
<head></head>
<body>
<div>
<pre>

    <html>
    <body>
    -----> hello! ----< 
    </body>
    </html

</pre>
</div>
</body>

EDIT: As indicated by another answer, regex does not support HTML or XHTML completely, and so you will be better off using an HTML parser instead. I'm leaving my answer here for reference though.

What do you want to replace the content inside the pre-tags with?

I'm not familiar with the specific C# syntax, but provided C# uses Perl-style regexes, the following PHP-snippet might be helpful. The code below will replace the content inside the pre-tags with the string "(pre tag content was here)" (just tested with the command line PHP client):

<?php
$html = "<html><head></head><body><div><pre class=\"some-css-class\">
         <html><body>
         -----> hello! ----< 
         </body></html
         </pre></div></body>"; // Compacting things here, for brevity

$newHTML = preg_replace("/(.*?)<pre[^<>]*>(.*?)<\/pre>(.*)/Us", "$1(pre tag content was here)$3", $html);
echo $newHTML;
?>

The ? mark is to make the matching non-greedy (stop at first occurence of what comes after), and the mU modifiers specifies "Unicode-character-support" and "single-line support". The latter is important to make . match newlines also. The [^<>]* part is for supporting attributes in the pre tag, such as <pre class="some-css-class"> (it will match any number of characters except for < or > .

UPDATE: As indicated by Martinho Fernandes in the comments below, the C# syntax for the above regex should be something like:

new Regex(@"(.*?)<pre[^<>]*>(.*?)<\/pre>(.*)", RegexOptions.SingleLine)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM