简体   繁体   中英

Regular expression to find specific URLs within a string in PHP

I am looking to find specific URLs within a large string of text, the URLs are in this format:

https://name.myurl.com/#/shop/ rmpa8cmnfg3eerpus3ap9jwekz6k77pnj2pg50ua /login

*The bold part is random.

Currently I am able to extrapolate ALL URLs using the following:

preg_match_all('!https?://\S+!', $string, $matches); 

I then need to loop around and pull out all URLs that include a specific string using:

$arr = $matches[0];

foreach ($arr as $haystack) {

    if (strlen(strstr($haystack,"shop"))>0) {

      echo $haystack;

    }
}

I am trying to make the code more efficient and can't seem to nail down a regular expression that can find all URLs matching:

https://name.myurl.com/#/shop/rmpa8cmnfg3eerpus3ap9jwekz6k77pnj2pg50ua/login

If I could it would alleviate the need to do the second string lookup.

Any help would be much appreciated.

Thanks

The point is that you need to ask yourself what is so particular in the string you need to match. If the URL contains a subpath of interest, if the subpart is the second, or second from the end, or it consists of both letter and digits, etc.

Once you know what to match, you can start on a regex.

It seems that you need to match URLs with /shop/ subpath. Then, all you need is to include that subpattern to the pattern. Since it is a literal sequence of characters, there is nothing difficult about it:

'~https?://\S+/shop/\S+~'
              ^^^^^^

See the regex demo

If all you want to do is to verify that the /shop/ part is part of the URL, use:

https?:\/\/\S*\/shop\/\S*

It's basically your regex, with the addition of requiring /shop/ after the protocol part (http(s)://), and allowing non space characters before and after the shop-part.

Regards

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM