简体   繁体   中英

php “preg_match” mystery when using same regex on different variables - becomes the same. why?

I have tried to grab api ids for yelp, tripadviso, foursquare & twitter based on their url. I did it with regular expressions and the "preg_match" function. however I encountered a very strange phenomenon.

$yelp_option = 'https://www.yelp.com/biz/belved%C3%A8re-delft-2';
$foursquare_option = 'https://www.tripadvisor.nl/Restaurant_Review-g188626-d2375308-Reviews-Belgian_Beer_Cafe_Belvedere-Delft_South_Holland_Province.html';
$tripadvisor_option = 'https://foursquare.com/v/belv%C3%A9d%C3%A8re/4ad8dc0ff964a520601521e3';
$twitter_option = 'https://twitter.com/duitdaging';

if(strpos($yelp_option, 'biz/')){
    preg_match("/(?<=biz\/).*/", $yelp_option, $output_array);
    $yelp_option = $output_array[0];
}

if(strpos($foursquare_option, '/')){
    preg_match("/[^/]+$/", $foursquare_option, $output_array);
    $foursquare_option = $output_array[0];
}

if(strpos($tripadvisor_option, '-d')){
    preg_match("/(?<=-d)(.*?)(?=-)/", $tripadvisor_option, $output_array);
    $tripadvisor_option = $output_array[0];
}

if(strpos($twitter_option, '/')){
    preg_match("/[^/]+$/", $twitter_option, $output_array);
    $twitter_option = $output_array[0];
}

The output was very unexpected....

$yelp_option = 'belved%C3%A8re-delft-2';
$foursquare_option = '2375308';
$tripadvisor_option = '4ad8dc0ff964a520601521e3';
$twitter_option = '2375308';

I tired for 1½ hour to move stuff around, comment out stuff... nothing seemed logical. Why does $twitter_option become the same as $foursquare_option??? is it because the regex pattern is the same? I tried adding a . to the twitter pattern so it look like so: [^/]+.$ - now it's different but the regex should produce the same right? Still it became the same as $foursquare_option...

I tired flipping the order around so twitter preg_match executes before the foursquare one, but the result was not as I expected, I would think both variables would now be duitdaging but instead both was an empty string....

When I comment out the entire if(...){foursquare regex...} the twitter one works fine and produces duitdaging . But when I have both of them it just wont work.

I solved it easyli by changing the twitter sequence to so:

if(strpos($yelp_option, 'biz/')){
    preg_match("/(?<=biz\/).*/", $yelp_option, $output_array);
    $yelp_option = $output_array[0];
}

if(strpos($foursquare_option, '/')){
    preg_match("/[^/]+$/", $foursquare_option, $output_array);
    $foursquare_option = $output_array[0];
}

if(strpos($tripadvisor_option, '-d')){
    preg_match("/(?<=-d)(.*?)(?=-)/", $tripadvisor_option, $output_array);
    $tripadvisor_option = $output_array[0];
}

if(strpos($twitter_option, "/")){
    $pieces = explode("/", $twitter_option);
    $twitter_option = end($pieces);
}

so I don't use preg_match for both twitter and foursquare. This gave the right outcome

$yelp_option = 'belved%C3%A8re-delft-2';
$foursquare_option = '2375308';
$tripadvisor_option = '4ad8dc0ff964a520601521e3';
$twitter_option = 'duitdaging';

I'm still more confused than I have ever been so I just MUST ask this question. Does this seem logical to anyone?

  • Happy friday x_x

Issue Explanation

Since you are using the delimiter / to enclose your regex pattern, you cannot use / in the regex explicitly. It must be escaped with \\ . There is another option in PHP and some other languages to use a different delimiter to enclose your pattern. Keep reading for more details.

Stackoverflow Sources

See this answer for more details (quoted below).

What context/language? Some languages use / as the pattern delimiter, so yes, you need to escape it, depending on which language/context. You escape it by putting a backward slash in front of it: \\/ For some languages (like PHP) you can use other characters as the delimiter and therefore you don't need to escape it. But AFAIK in all languages, the only special significance the / has is it may be the designated pattern delimiter.

PHP Documentation Sources

See the link to the documentation here . The documentation states the following (note the last quoted section is of upmost importance/relevance to this question):

When using the PCRE functions, it is required that the pattern is enclosed by delimiters. A delimiter can be any non-alphanumeric, non-backslash, non-whitespace character.

Often used delimiters are forward slashes (/), hash signs (#) and tildes (~). The following are all examples of valid delimited patterns.

 /foo bar/ #^[^0-9]$# +php+ %[a-zA-Z0-9_-]% 

It is also possible to use bracket style delimiters where the opening and closing brackets are the starting and ending delimiter, respectively. (), {}, [] and <> are all valid bracket style delimiter pairs.

 (this [is] a (pattern)) {this [is] a (pattern)} [this [is] a (pattern)] <this [is] a (pattern)> 

Bracket style delimiters do not need to be escaped when they are used as meta characters within the pattern, but as with other delimiters they must be escaped when they are used as literal characters.
If the delimiter needs to be matched inside the pattern it must be escaped using a backslash. If the delimiter appears often inside the pattern, it is a good idea to choose another delimiter in order to increase readability.

 /http:\\/\\// #http://# 

Solution

Solution 1

This solution escapes the delimiter inside the regex using the backslash \\ .

if(strpos($yelp_option, 'biz/')){
    preg_match("/(?<=biz\/).*/", $yelp_option, $output_array);
    $yelp_option = $output_array[0];
}

if(strpos($foursquare_option, '/')){
    preg_match("/[^\\/]+$/", $foursquare_option, $output_array);
    $foursquare_option = $output_array[0];
}

if(strpos($tripadvisor_option, '-d')){
    preg_match("/(?<=-d)(.*?)(?=-)/", $tripadvisor_option, $output_array);
    $tripadvisor_option = $output_array[0];
}

if(strpos($twitter_option, '/')){
    preg_match("/[^\\/]+$/", $twitter_option, $output_array);
    $twitter_option = $output_array[0];
}

Solution 2

This solution uses a different delimiter to enclose the pattern ~

if(strpos($yelp_option, 'biz/')){
    preg_match("~(?<=biz\/).*~", $yelp_option, $output_array);
    $yelp_option = $output_array[0];
}

if(strpos($foursquare_option, '/')){
    preg_match("~[^/]+$~", $foursquare_option, $output_array);
    $foursquare_option = $output_array[0];
}

if(strpos($tripadvisor_option, '-d')){
    preg_match("~(?<=-d)(.*?)(?=-)~", $tripadvisor_option, $output_array);
    $tripadvisor_option = $output_array[0];
}

if(strpos($twitter_option, '/')){
    preg_match("~[^/]+$~", $twitter_option, $output_array);
    $twitter_option = $output_array[0];
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM