简体   繁体   中英

PHP: How to insert a string into matched regex pattern (adding rel=“no-follow” to anchor links)

I am writing a commenting system for my website, using PHP.

I want to do the following:

  1. Detect all external links (ie anchor tags with source NOT containing the string mywebsite.com) in a comment
  2. Add the string 'rel="no-follow"' to anchor tags identified in step 1 above.

I have an idea for such a function, but I will need some help from more experienced PHP developers so that I'm sure I'm doing things the right way. This is what my first attempt looks like

<?php

function process_comment($comment)
{

    $external_url_pattern = "href=[^mywebsite.com]"; //this regex is probably wrong (Help!)

    //are there any matches
    $matches = array();
    preg_match_all($external_url_pattern, $comment, $matches);

    foreach($matches as $match)
    {
       // how do we insert the 'rel="no-follow" string ?
    }

}


?>

Would appreciate any comments, pointers and tips in helping me complete this function. Thanks.

Dont know if this will be appropriate, but instead of regex you could do with DOMDocument as well:

$dom = new DOMDocument();
$dom->loadHTML($html);

//Evaluate Anchor tag in HTML
$xpath = new DOMXPath($dom);
$hrefs = $xpath->evaluate("/html/body//a");

for ($i = 0; $i length; $i++) {
        $href = $hrefs->item($i);
        $url = $href->getAttribute('href');

        if($url == "mywebsite.com") {        
             $href->setAttribute("rel", "no-follow");
        }
}

// save html
$html=$dom->saveHTML();

echo $html;

Hope it helps

This is a bit tricky but will do the job.

function process_comment($str)
{

    //parses href attribute values into $match
    if(preg_match_all('/href\=\"(.*)\"/',$str,$match))
    {
        foreach($match[1] as $v)
        {
            //check matched value contains your site as host name
            //if not 
            //adds rel="no-follow" and replaces the link with the attribute
            if(!preg_match('@^(?:http://)?(w+\.)?'.$mysite.'(.*)?@i',$v, $m))
            {
                $rel = $v.'" rel="no-follow';
                $str = str_replace($v,$rel,$str);
            }   
        }
    }

    return $str;
}

process_comment($comment);

You can simply use strstr instead of second preg_match . I used it because I think some urls may contain something like this "http://www.external.com/url.php?v=www.mysite.com"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM