简体   繁体   English

正则表达式多次捕获组

[英]Regex Multiple Capture of Group

I'm using regex to capture the dimensions of ads 我正在使用正则表达式来捕获广告的尺寸

Source content is an HTML File, and I'm trying to capture for content that looks like: 源内容是一个HTML文件,我正在尝试捕获类似于以下内容的内容:

size[200x400,300x1200] (could be 1-4 different sizes)

I'm trying to an array with the different sizes in it 我正在尝试使用不同大小的数组

My capture code looks like this: 我的捕获代码如下所示:

$size_declaration = array();
$sizes = array();
$declaration_pattern = "/size\[(\d{2,4}x\d{2,4}|\d{2,4}x\d{2,4},){1,4}\]/";
$sizes_pattern = "/\d{2,4}x\d{2,4}/";

$result = preg_match($declaration_pattern, $html, $size_declaration);
if( $result ) {
    $result = preg_match_all($sizes_pattern, $size_declaration[0], $sizes);
    var_dump($sizes);
}

The code above produces usable results: 上面的代码产生可用的结果:

$sizes = array(
  [0] => array (
    [0] => '200x400',
    [1] => '300x1200'
  )
)

but it takes quite a bit of code. 但是需要很多代码。 I was thinking it was possible to collect the results with a single regex, but I couldn't find a result that works. 我当时以为可以用一个正则表达式来收集结果,但是找不到有效的结果。 Is there a way to clean this up a bit? 有办法清理一下吗?

It's not very practical to turn it into a single expression; 将其转换为单个表达式不是很实际。 it would be better to keep them separate; 最好将它们分开; the first expression finds the boundaries and does rudimentary content checks on the inner contents, the second expression breaks it down into individual pieces: 第一个表达式找到边界,并对内部内容进行基本的内容检查,第二个表达式将其分解为各个部分:

if (preg_match_all('/size\[([\dx,]+)\]/', $html, $matches)) {
    foreach ($matches[0] as $size_declaration) {
        if (preg_match_all('/\d+x\d+/', $size_declaration, $sizes)) {
            print_r($sizes[0]);
        }
    }
}

This one is a little simpler: 这个简单一些:

$html = "size[200x400,300x600,300x100]";
if (($result = preg_match_all("/(\d{2,4}x\d{2,4}){1,4}/", $html, $matches)) > 0)
    var_dump($matches);
// 
// $matches => 
//     array(
//          (int) 0 => array(
//              (int) 0 => '200x400',
//              (int) 1 => '300x600',
//              (int) 2 => '300x100'
//          ),
//          (int) 1 => array(
//              (int) 0 => '200x400',
//              (int) 1 => '300x600',
//              (int) 2 => '300x100'
//          )
//     )
// 

The only way is to repeat the 4 eventual sizes in the pattern: 唯一的方法是在模式中重复4种最终尺寸:

$subject = <<<LOD
size[523x800]
size[200x400,300x1200]
size[201x300,352x1200,123x456]
size[142x396,1444x32,143x89,231x456]
LOD;

$pattern = '`size\[(\d{2,4}x\d{2,4})(?:,(\d{2,4}x\d{2,4}))?(?:,(\d{2,4}x\d{2,4}))?(?:,(\d{2,4}x\d{2,4}))?]`';

preg_match_all($pattern, $subject, $matches, PREG_SET_ORDER);
foreach ($matches as &$match) { array_shift($match); }

print_r($matches);

The pattern can also be shorten using references to capture groups: 也可以使用引用捕获组来缩短模式:

$pattern = '`size\[(\d{2,4}x\d{2,4})(?:,((?1)))?(?:,((?1)))?(?:,((?1)))?]`';

or with the Oniguruma syntax: 或使用Oniguruma语法:

$pattern = '`size\[(\d{2,4}x\d{2,4})(?:,(\g<1>))?(?:,(\g<1>))?(?:,(\g<1>))?]`';

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM