简体   繁体   English

以正则表达式获取所有匹配项

[英]Get all matches in a regular expression

I have this URL: 我有这个网址:

uploads/offers/picture/_YToxOntzOjc6Im9wdGlvbnMiO3M6MTY6Inpvb21Dcm9wLDI4MS_/_wyMDAiO30=_/518edc82d94b0-201341717250_descuen_a06d000000fkvwpiak_1_1.jpg 

And I need to get all /_(.*)_/ parts, but my preg_match_all expression seems bad formed: 而且我需要获取所有/_(.*)_/部分,但是我的preg_match_all表达式似乎/_(.*)_/不好:

preg_match_all('#/_([^_/]+)_/#', $url, $params);

Returns 返回

Array
(
    [0] => Array
        (
            [0] => /_YToxOntzOjc6Im9wdGlvbnMiO3M6MTY6Inpvb21Dcm9wLDI4MS_/
        )
    [1] => Array
        (
            [0] => YToxOntzOjc6Im9wdGlvbnMiO3M6MTY6Inpvb21Dcm9wLDI4MS
        )
)

And I need 我需要

Array
(
    [0] => Array
        (
            [0] => /_YToxOntzOjc6Im9wdGlvbnMiO3M6MTY6Inpvb21Dcm9wLDI4MS_/
            [1] => /_wyMDAiO30=_/
        )
    [1] => Array
        (
            [0] => YToxOntzOjc6Im9wdGlvbnMiO3M6MTY6Inpvb21Dcm9wLDI4MS
            [1] => wyMDAiO30=
        )
)

Some problem with common string parts in expression? 表达式中常见的字符串部分有问题吗?

The final / in the regex ends up consuming it. 正则表达式中的最终/最终消耗了它。 One simple way to get around this is to use a lookahead. 解决此问题的一种简单方法是使用前瞻。

preg_match_all('#/_([^_/]+)_(?=/)#', $url, $params);

The / in between doesn't match twice, however, you could use lookahead/behind assertions: 之间的/不匹配两次,但是,您可以使用先行/后置断言:

preg_match_all('#(?<=/_)[^_/]+(?=_/)#', $url,$params);

array(1) {
  [0]=>
  array(2) {
    [0]=>
    string(50) "YToxOntzOjc6Im9wdGlvbnMiO3M6MTY6Inpvb21Dcm9wLDI4MS"
    [1]=>
    string(10) "wyMDAiO30="
  }
}

One problem with your current solution is that it matches the / at the end of the expression as Explosion Pill's answer says; 当前解决方案的一个问题是,正如Explosion Pill的答案所说,它与表达式末尾的/匹配。 using positive lookahead will solve that problem. 使用正向前瞻将解决该问题。

Another possible issue is that the [^_/] part may end up breaking the regex if the input contains underscores as part of the matches you do want to capture. 另一个可能的问题是,如果输入中包含下划线作为您要捕获的匹配项的一部分,则[^_/]部分可能最终会破坏正则表达式。

To solve both issues at once: 要立即解决两个问题:

~/_(.+?)_(?=/)~

This seems to me to be closer to what you are after: "whenever you see the sequence /_ start capturing all input until you come across the sequence _/ ". 在我看来,这似乎更接近您的需求:“每当看到序列/_开始捕获所有输入,直到遇到序列_/为止”。 Lone underscores inside the input will not break this. 输入中的下划线不会破坏这一点。

Your expression picks up TWO _ , so the wyMDAiO30= part is skipped. 您的表达式拾取了两个_ ,因此跳过了wyMDAiO30=部分。

I suggest you use explode("_", $url) (or preg_split(...) if the above is just an example and you need regexes to recognize splitting characters/substrings). 我建议您使用explode("_", $url) (或preg_split(...)如果以上仅是示例,并且您需要正则表达式来识别分割的字符/子字符串)。

If you really insist on using preg_match_all , check the documentation. 如果您确实坚持使用preg_match_all ,请查阅文档。 There is a way to say "match this, but don't include it in the string". 有一种方式可以说“匹配它,但不要将它包括在字符串中”。 I think it's something like #_([^_/]+)(?=_)# . 我认为它就像#_([^_/]+)(?=_)#

Best solution would probably be to split the string first and then check for underscores: 最好的解决方案可能是先分割字符串,然后检查下划线:

<?php

$data = explode('/', $url);

foreach($data as $val) {
    if(substr($val, 0, 1) === '_' && substr($val, -1) === '_') {
        // ok
    }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM