简体   繁体   中英

Regex capture group always as first

I have this PHP regular Expression:

https?://(?:[a-z0-9]+\\.)?livestream\\.com/(?:(accounts/[0-9]+/events/[0-9]+(?:/videos/[0-9]+)?)|[^\\s/]+/video\\?clipId=([^\\s&]+)|([^\\s/]+))

I like to match the following URLs with the the results.

http://original.livestream.com/bethanychurchnh = bethanychurchnh

http://original.livestream.com/bethanychurchnh/video?clipId=flv_b54a694b-043c-4886-9f35-03c8008c23 = flv_b54a694b-043c-4886-9f35-03c8008c23

http://livestream.com/accounts/142499/events/3959775 = accounts/142499/events/3959775

http://livestream.com/accounts/142499/events/3959775/videos/83958146 = /accounts/142499/events/3959775/videos/83958146

It works fine but I have this problem that the capture groups are 2nd and 3rd for some of the matches. I like to have the captured string always be matched as the first capture group. Is this possible?

You can use a branch reset in your regex:

https?:\/\/(?:[a-z0-9]+\.)?livestream\.com\/(?|(accounts\/[0-9]+\/events\/[0-9]+(?:\/videos\/[0-9]+)?)|[^\s\/]+\/video\?clipId=([^\s&]+)|([^\s\/]+))
                                             ^^

See regex demo

See description of branch reset at regular-expressions.info :

Alternatives inside a branch reset group share the same capturing groups . The syntax is (?|regex) where (?| opens the group and regex is any regular expression. If you don't use any alternation or capturing groups inside the branch reset group, then its special function doesn't come into play. It then acts as a non-capturing group .

Other possibility, you can allow duplicate named captures with (?J)

$pattern = '~(?J)https?://(?:[a-z0-9]+\.)?livestream\.com/
(?:
    (?<id>accounts/[0-9]+/events/[0-9]+(?:/videos/[0-9]+)?)
  |
    [^\s/]+/video\?clipId=(?<id>[^\s&]+)
  |
    (?<id>[^\s/]+)
)~x';

if (preg_match($pattern, $text, $m))
    echo $m['id'];

demo

Or since what you are looking for is always at the end of the pattern, you don't need a capture group at all with the \\K feature that removes all on the left from the whole match result:

$pattern = '~https?://(?:[a-z0-9]+\.)?livestream\.com/ \K
(?:
    accounts/[0-9]+/events/[0-9]+(?:/videos/[0-9]+)?
  |
    [^\s/]+(?:/video\?clipId=\K[^\s&]+)?
)~x';

if (preg_match($pattern, $text, $m))
    echo $m[0];

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM