简体   繁体   English

php preg_match_all返回数组数组

[英]php preg_match_all returning array of arrays

I want to replace some template tags: 我想替换一些模板标签:

$tags = '{name} text {first}';
preg_match_all('~\{(\w+)\}~', $tags, $matches);
var_dump($matches);

output is: 输出是:

array(2) { 
          [0]=> array(2) { 
                         [0]=> string(6) "{name}" 
                         [1]=> string(7) "{first}" 
                         } 
          [1]=> array(2) { 
                         [0]=> string(4) "name" 
                         [1]=> string(5) "first" 
                         }
         }

why are there inside 2 arrays? 为什么有2个阵列? How to achieve only second one? 如何实现只有第二个?

The sort answer: 排序答案:

Is there an alternative? 还有其他选择吗? Of course there is: lookaround assertions allow you to use zero-width (non-captured) single char matches easily: 当然有: lookaround断言允许您轻松使用零宽度(非捕获)单个字符匹配:

preg_match_all('/(?<=\{)\w+(?=})/', $tags, $matches);
var_dump($matches);

Will dump this: 将转储此:

array(1) {
  [0]=>
  array(2) {
    [0]=>
    string(4) "name"
    [1]=>
    string(5) "first"
  }
}

The pattern: 模式:

  • (?<=\\{) : positive lookbehind - only match the rest of the pattern if there's a { character in front of it (but don't capture it) (?<=\\{) :正面看后 - 如果前面有一个{字符(但不捕获它),则只匹配模式的其余部分
  • \\w+ : word characters are matches \\w+ :单词字符是匹配的
  • (?=}) : only match preceding pattern if it is followed by a } character (but don't capture the } char) (?=}) :仅匹配前一个模式,如果后跟一个}字符(但不捕获} char)

It's that simple: the pattern uses the {} delimiter chars as conditions for the matches, but doesn't capture them 就这么简单:模式使用{}分隔符字符作为匹配的条件,但不捕获它们

Explaining this $matches array structure a bit: 解释这个$matches数组结构:

The reason why $matches looks the way it does is quite simple: when using preg_match(_all) , the first entry in the match array will always be the entire string matched by the given regex. $matches的原因很简单:当使用preg_match(_all) ,匹配数组中的第一个条目将始终是给定正则表达式匹配的整个字符串。 That's why I used zero-width lookaround assertions, instead of groups. 这就是我使用零宽度环视断言而不是组的原因。 Your expression matches "{name}" in its entirety, and extracts "name" through grouping. 您的表达式完整地匹配"{name}" ,并通过分组提取"name"
The matches array will hold the full match on index 0 , and add groups at every subsequent index, in your case that means that: 匹配数组将在索引0上保持完全匹配,并在每个后续索引处添加组,在您的情况下,这意味着:

  • $matches[0] will contain all substrings matching /\\{\\w+\\}/ as a pattern. $matches[0]将包含匹配/\\{\\w+\\}/作为模式的所有子字符串。
  • $matches[1] will contain all substrings that were captured ( /\\{(\\w+)\\}/ captures (\\w+) ). $matches[1]将包含捕获的所有子字符串( /\\{(\\w+)\\}/ capture (\\w+) )。

If you were to have a regex like this: /\\{((\\w)([^}]+))}/ the matches array will look something like this: 如果你有这样的正则表达式: /\\{((\\w)([^}]+))}/匹配数组将看起来像这样:

[
    0 => [
        '{name}',//as if you'd written /\{\w[^}]+}/
    ],
    1 => [
        'name',//matches group  (\w)([^}]+), as if you wrote (\w[^}]+)
    ],
    2 => [
        'n',//matches (\w) group
    ],
    3 => [
        'ame',//and this is the ([^}]+) group obviously
    ]
]

Why? 为什么? simple because the pattern contains 3 matching groups. 很简单,因为模式包含3个匹配的组。 Like I said: the first index in the matches array will always be the full match, regardless of capture groups. 就像我说的:匹配数组中的第一个索引将始终是完全匹配,无论捕获组如何。 The groups are then appended to the array in the order the appear in in the expression. 然后按照表达式中出现的顺序将这些组附加到数组中。 So if we analyze the expression: 所以,如果我们分析表达式:

  • \\{ : not matches, but part of the pattern, will only be in the $matches[0] values \\{ :不匹配,但模式的一部分,只会在$matches[0]值中
  • ((\\w)([^}]+)) : Start of first matching group, \\w[^}]+ match is grouped here, $matches[1] will contain these values ((\\w)([^}]+)) :第一个匹配组的开始, \\w[^}]+匹配在这里分组, $matches[1]将包含这些值
  • (\\w) : Second group, a single \\w char (ie first character after { . $matches[2] will therefore contain all first characters after a { (\\w) :第二组,一个\\w char(即{$matches[2]之后的第一个字符因此将包含{之后的所有第一个字符
  • ([^}]+) : Third group, matches rest of string after {\\w until a } is encountered, this will make out the $matches[3] values ([^}]+) :第三组,匹配{\\w之后的字符串的剩余部分直到遇到} ,这将得出$matches[3]

To better understand, and be able to predict the way $matches will get populated, I'd strongly recommend you use this site: regex101 . 为了更好地理解并能够预测$matches填充方式,我强烈建议您使用此站点:regex101 Write your expression there, and it'll break it all down for you on the right hand side, listing the groups. 在那里写下你的表达,它会在右侧为你打破,列出组。 For example: 例如:

/\{((\w)([^}]+))}/

Is broken down like this: 像这样分解:

/\{((\w)([^}]+))}/
  \{ matches the character { literally
  1st Capturing group ((\w)([^}]+))
    2nd Capturing group (\w)
      \w match any word character [a-zA-Z0-9_]
    3rd Capturing group ([^}]+)
      [^}]+ match a single character not present in the list below
      Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
      } the literal character }
  } matches the character } literally

Looking at the capturing groups, you can now confidently say what $matches will look like, and you can safely say that $matches[2] will be an array of single characters. 查看捕获组,您现在可以自信地说出$matches外观,并且您可以放心地说$matches[2]将是一个单个字符数组。

Of course, this may leave you wondering why $matches is a 2D array. 当然,这可能会让你想知道为什么$matches是一个2D数组。 Well, that again is really quite easy: What you can predict is how many match indexes a $matches array will contain: 1 for the full pattern, then +1 for each capture group. 好吧,这又是非常简单的:你可以预测的是$matches数组将包含多少匹配索引:1表示完整模式,然后+1表示每个捕获组。 What you Can't predict, though, is how many matches you'll find. 但是,你无法预测的是你会发现多少匹配。
So what preg_match_all does is really quite simple: fill $matches[0] with all substrings that match the entire pattern, then extract each group substring from these matches and append that value onto the respective $matches arrays. 那么preg_match_all作用非常简单:用匹配整个模式的所有子串填充$matches[0] ,然后从这些匹配中提取每个子子串并将该值附加到相应的$matches数组。 In other words, the number of arrays that you can find in $matches is a given: it depends on the pattern. 换句话说,您可以在$matches找到的数组数量是给定的:它取决于模式。 The number of keys you can find in the sub-arrays of $matches is an unknown, it depends on the string you're processing. 您在$matches的子数组中可以找到的键数是未知的,它取决于您正在处理的字符串。 If preg_match_all were to return a 1D array, it would be a lot harder to process the matches, now you can simply write this: 如果preg_match_all要返回一维数组,那么处理匹配将会困难得多,现在你可以简单地写一下:

$total = count($matches);
foreach ($matches[0] as $k => $full) {
    echo $full . ' contains: ' . PHP_EOL;
    for ($i=1;$i<$total;++$i) {
        printf(
            'Group %d: %s' . PHP_EOL,
            $i, $matches[$i][$k]
        );
    }
}

If preg_match_all created a flat array, you'd have to keep track of the amount of groups in your pattern. 如果preg_match_all创建了一个平面数组,则必须跟踪模式中的组数量。 Whenever the pattern changes, you'd also have make sure to update the rest of the code to reflect the changes made to the pattern, making your code harder to maintain, whilst making it more error-prone, too 每当模式发生变化时,您还必须确保更新其余代码以反映对模式所做的更改,从而使代码更难维护,同时使代码更容易出错。

Thats because your regex could have multiple match groups - if you have more (..) you would have more entries in your array. 那是因为你的正则表达式可以有多个匹配组 - 如果你有更多(..)你的阵列中会有更多的条目。 The first one[0] ist always the whole match. 第一个[0]总是整场比赛。

If you want an other order of the array, you could use PREG_SET_ORDER as the 4. argument for preg_match_all. 如果您想要数组的其他顺序,可以使用PREG_SET_ORDER作为preg_match_all的4.参数。 Doing this would result in the following 这样做会导致以下结果

array(2) { 
          [0]=> array(2) { 
                         [0]=> string(6) "{name}" 
                         [1]=> string(7) "name" 
                         } 
          [1]=> array(2) { 
                         [0]=> string(4) "{first}" 
                         [1]=> string(5) "first" 
                         }
         }

this could be easier if you loop over your result in a foreach loop. 如果你在foreach循环中循环结果,这可能会更容易。

If you only interessted in the first match - you should stay with the default PREG_PATTERN_ORDER and just use $matches[1] 如果您只在第一场比赛中进行了比赛 - 您应该使用默认的PREG_PATTERN_ORDER并使用$matches[1]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM