简体   繁体   中英

How to append querystring to every URL in a string using PHP and regexp

I using PHP 5.6.40-0+deb8u5 on LINUX

I want to add querystring to every URL in the text string. I NEARLY works, but never does the last URL. What am I missing?

Tried How to append to all urls in a string? but it never does the very last URL in the string.

<?php
    $message = '<h4>Hello there AGAIN . visit  <br />         
    href="http://www.my-domain.com/another-link/" ' ; 
    $message .= ' <br /> or href="http://sub-domain.my-domain.com/subdir/sub-sub-dir/" ';
    $message .= ' <br /> or href="https://www.my-domain.com?uid=hello" ';
    $message .= ' <br /> or href="http://my-domain.com" ';
    $message .= ' <br /> or href="https://my-domain.com" ';
    $message .= ' <br /> or href="http://my-domain.com/" ';
    $message .= ' <br /> or href="https://my-domain.com/" ';
    $message .= ' <br /> or href="http://subdomain.my-domain.com/" ';
    $message .= ' <br /> or href="https://subdomain.my-domain.com" ';
    $message .= ' <br /> or href="http://subdomain.my-domain.com/more-page" ';
    $message .= ' <br /> or "https://subdomain.my-domain.com/"  with no href at the beginning';
    $message .= ' <br /> or href="http://subdomain.my-domain.com/one-more-page/sub-page"  with some more text after it.  ';
    $message .= ' <br /> or href="http://last-one.my-domain.com/one-more-page/sub-page"  with some more text after it. </h4>';

    echo $message;

    function AppendCampaignToString($string) {
        $regex = '/(href="https?:\/\/)(\w*.?my-domain\.com[^"]*)("[^>]*?>/i';
        return preg_replace_callback($regex, '_appendCampaignToString', $string);
    }

    function _AppendCampaignToString($match) {
        $url = $match[2];
        if (strpos($url, '?') === false) {
            $url .= '?';
        }
        else {
            $url .= '&';            
        }
        $url .= "MyID=666888";
        return $match[1].$url  ;
    }

    echo "<hr>" .  AppendCampaignToString($message) . "<hr />" ;
?>

It works for every kind of URL , sub-domain and file path EXCEPT the very last URL, no matter what type of URL the last URL is. so

echo "


" . AppendCampaignToString($message) . "
" ;

gives:

Hello there AGAIN . visit
href="http://www.my-domain.com/another-link/?MyID=666888"

or href="http://www.my-domain.com/subdir/sub-sub-dir/?MyID=666888"
or href="https://www.my-domain.com?uid=hello&MyID=666888"
or href="http://my-domain.com?MyID=666888"
or href="https://my-domain.com?MyID=666888"
or href="http://my-domain.com/?MyID=666888"
or href="https://my-domain.com/?MyID=666888"
or href="http://subdomain.my-domain.com/?MyID=666888"
or href="https://subdomain.my-domain.com?MyID=666888"
or href="http://subdomain.my-domain.com/more-page?MyID=666888"
or " https://subdomain.my-domain.com/ " with no href at the beginning
or href="http://subdomain.my-domain.com/one-more-page/sub-page?MyID=666888" whit some more text after it.
or href="http://last-one.my-domain.com/one-more-page/sub-page" with some more text after it.

Your last domain has - s in it so you need to put that in a character class with the \\w . This works:

(href="https?:\/\/)([\w-]*.?my-domain\.com[^"]*)("[^>]*?>)

https://regex101.com/r/etxiQI/2/

Also note the regex in your question was missing a closing ) .

Additionally if my-domain is the top domain name the . preceding that should be escaped as well. eg:

(href="https?:\/\/)([\w-]*\.?my-domain\.com[^"]*)("[^>]*?>)

Although @user3783243 was faster than me, I am posting a pseudo-working script, because I spent some minutes on debugging this:

<?php
    $message = '<h4>Hello there AGAIN . visit  <br />         
    href="http://www.my-domain.com/another-link/" ' ;
    $message .= ' <br /> or href="http://sub-domain.my-domain.com/subdir/sub-sub-dir/" ';
    $message .= ' <br /> or href="https://www.my-domain.com?uid=hello" ';
    $message .= ' <br /> or href="http://my-domain.com" ';
    $message .= ' <br /> or href="https://my-domain.com" ';
    $message .= ' <br /> or href="http://my-domain.com/" ';
    $message .= ' <br /> or href="https://my-domain.com/" ';
    $message .= ' <br /> or href="http://subdomain.my-domain.com/" ';
    $message .= ' <br /> or href="https://subdomain.my-domain.com" ';
    $message .= ' <br /> or href="http://subdomain.my-domain.com/more-page" ';
    $message .= ' <br /> or "https://subdomain.my-domain.com/"  with no href at the beginning';
    $message .= ' <br /> or href="http://subdomain.my-domain.com/one-more-page/sub-page"  with some more text after it.  ';
    $message .= ' <br /> or href="http://last-one.my-domain.com/one-more-page/sub-page"  with some more text after it. </h4>';

    echo $message;

    function AppendCampaignToString($string) {
        $regex = '/(href="https?:\/\/)([a-z0-9-]*.?my-domain\.com[^"]*)"[^>]*?>/i';
        return preg_replace_callback($regex, '_appendCampaignToString', $string, -1);
    }

    function _AppendCampaignToString($match) {
        $url = $match[2];

        echo "MATCHED $url \n";
        if (strpos($url, '?') === false) {
            $url .= '?';
        }
        else {
            $url .= '&';
        }
        $url .= "MyID=666888";
        return $match[1].$url  ;
    }

    echo "<hr>" .  AppendCampaignToString($message) . "<hr />" ;
?>
  • I took out the last open parenthesis from the regex (also mentioned by @user3783243)
  • added a debug message in the callback, to see what's actually being matched
  • extended the subdomain match to also match numbers, besides \\w and -

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM