简体   繁体   中英

PHP function to replace a HTML tag (e.g. meta description) using preg_replace

Can someone help me get this function to work? The function should accept $HTMLstr -- a whole page of HTML stuffed into a string that already contains a meta description in the form of:

<meta name="description" content="This will be replaced"/>

along with $content which is the string that should replace "This will be replaced". I thought I was close with this function, but it doesn't quite work.

function HTML_set_meta_description ($HTMLstr, $content) {
$newHTML = preg_replace('/<meta name="description"(.*)"\/>/is', "<meta name=\"description\" content=\"$content\"/>", $HTMLstr);
return ($newHTML);
}

Thanks for any help!

Edit: Here's the working function.

function HTML_set_meta_description ($HTMLstr, $content) {
// assumes meta format is exactly <meta name="description" content="This will be replaced"/>
$newHTML = preg_replace('/<meta name="description" content="(.*)"\/>/i','<meta name="description" content="' . $content . '" />', $HTMLstr);
return ($newHTML);

}

Unless you know that the <meta> will be provided in a consistent format (which is difficult to know unless you actually have control over the HTML) you will have a very tough time constructing a working regex. Take these examples:

<meta content="content" name="description">
<meta content = 'content' name = 'description' />
<meta name= 'description' content ="content"/>

These are all valid, but the regex that would handle them would be very complex. Something like:

@<meta\s+name\s*=\s*('|")description\1\s+content\s*('|")(.*?)\2\s+/?>@

...and that doesn't even account for the attributes being in a different order. There may have been something else I didn't think of as well.

On the other hand using a parser such as DOMDocument may be very expensive, especially if your HTML is large. If you can depend on a consistent format for the <meta> you want to use .*? instead of .* to capture the content. .*? makes the search reluctant so it will stop at the first quote as opposed to the last -- there are likely to be many other quotes throughout the HTML document.

$dom = new DOMDocument;
$dom->loadHTML($HTMLstr);
foreach ($dom->getElementsByTagName("meta") as $tag) {
    if (stripos($tag->getAttribute("name"), "description") !== false) {
        $tag->setAttribute("content", $content);
    }
}
return $dom->saveHTML();

Using DOMDocument is recommended as already an answer, however if you're struggling with a regular expression, then I might help you out. You might try this instead:

return preg_replace('/<meta name="description" content="(.*)"\/>/i','<meta name="description" content="Something replaced" />', $HTMLstr);

I know you asked preg_replace and im late to answer but look at this, is it that you are looking for...

<?php
function meta_desc( $content = null ){
    $desc = 'This will be replaced ';
    if( $content ){
        $desc = $content;
    }
    return '<meta name="description"
content=" '. $desc .' "/>';
}
?>

Trust me its faster than that. I think you should use this function.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM