正则表达式多次捕获组

Question

I'm using regex to capture the dimensions of ads 我正在使用正则表达式来捕获广告的尺寸

Source content is an HTML File, and I'm trying to capture for content that looks like: 源内容是一个HTML文件，我正在尝试捕获类似于以下内容的内容：

size[200x400,300x1200] (could be 1-4 different sizes)

I'm trying to an array with the different sizes in it 我正在尝试使用不同大小的数组

My capture code looks like this: 我的捕获代码如下所示：

$size_declaration = array();
$sizes = array();
$declaration_pattern = "/size\[(\d{2,4}x\d{2,4}|\d{2,4}x\d{2,4},){1,4}\]/";
$sizes_pattern = "/\d{2,4}x\d{2,4}/";

$result = preg_match($declaration_pattern, $html, $size_declaration);
if( $result ) {
    $result = preg_match_all($sizes_pattern, $size_declaration[0], $sizes);
    var_dump($sizes);
}

The code above produces usable results: 上面的代码产生可用的结果：

$sizes = array(
  [0] => array (
    [0] => '200x400',
    [1] => '300x1200'
  )
)

but it takes quite a bit of code. 但是需要很多代码。 I was thinking it was possible to collect the results with a single regex, but I couldn't find a result that works. 我当时以为可以用一个正则表达式来收集结果，但是找不到有效的结果。 Is there a way to clean this up a bit? 有办法清理一下吗？

Answer 1

It's not very practical to turn it into a single expression; 将其转换为单个表达式不是很实际。 it would be better to keep them separate; 最好将它们分开； the first expression finds the boundaries and does rudimentary content checks on the inner contents, the second expression breaks it down into individual pieces: 第一个表达式找到边界，并对内部内容进行基本的内容检查，第二个表达式将其分解为各个部分：

if (preg_match_all('/size\[([\dx,]+)\]/', $html, $matches)) {
    foreach ($matches[0] as $size_declaration) {
        if (preg_match_all('/\d+x\d+/', $size_declaration, $sizes)) {
            print_r($sizes[0]);
        }
    }
}

Answer 2

This one is a little simpler: 这个简单一些：

$html = "size[200x400,300x600,300x100]";
if (($result = preg_match_all("/(\d{2,4}x\d{2,4}){1,4}/", $html, $matches)) > 0)
    var_dump($matches);
// 
// $matches => 
//     array(
//          (int) 0 => array(
//              (int) 0 => '200x400',
//              (int) 1 => '300x600',
//              (int) 2 => '300x100'
//          ),
//          (int) 1 => array(
//              (int) 0 => '200x400',
//              (int) 1 => '300x600',
//              (int) 2 => '300x100'
//          )
//     )
//

Answer 3

The only way is to repeat the 4 eventual sizes in the pattern: 唯一的方法是在模式中重复4种最终尺寸：

$subject = <<<LOD
size[523x800]
size[200x400,300x1200]
size[201x300,352x1200,123x456]
size[142x396,1444x32,143x89,231x456]
LOD;

$pattern = '`size\[(\d{2,4}x\d{2,4})(?:,(\d{2,4}x\d{2,4}))?(?:,(\d{2,4}x\d{2,4}))?(?:,(\d{2,4}x\d{2,4}))?]`';

preg_match_all($pattern, $subject, $matches, PREG_SET_ORDER);
foreach ($matches as &$match) { array_shift($match); }

print_r($matches);

The pattern can also be shorten using references to capture groups: 也可以使用引用捕获组来缩短模式：

$pattern = '`size\[(\d{2,4}x\d{2,4})(?:,((?1)))?(?:,((?1)))?(?:,((?1)))?]`';

or with the Oniguruma syntax: 或使用Oniguruma语法：

$pattern = '`size\[(\d{2,4}x\d{2,4})(?:,(\g<1>))?(?:,(\g<1>))?(?:,(\g<1>))?]`';

正则表达式多次捕获组

问题描述

3 个解决方案

解决方案1
1 已采纳 2013-04-12 02:20:18

解决方案2
0 2013-04-12 02:14:50

解决方案3
0 2013-04-12 04:50:12

正则表达式多次捕获组

问题描述

3 个解决方案

解决方案1 1 已采纳 2013-04-12 02:20:18

解决方案2 0 2013-04-12 02:14:50

解决方案3 0 2013-04-12 04:50:12

解决方案1
1 已采纳 2013-04-12 02:20:18

解决方案2
0 2013-04-12 02:14:50

解决方案3
0 2013-04-12 04:50:12