PHP Regexp：忽略定义的子字符串之前的所有内容

Question

I'm trying to parse a web page. 我正在尝试解析网页。 Basically it gets stored in a string that will look like this: 基本上，它存储在一个如下所示的字符串中：

"[HTML CODE ...]world:[HTML CODE ...]my_number[REST OF HTML_CODE ...]"

Of course "world:" and "MY_NUMBER" are part of the html code, however I would like to ignore everything before the first occurrence of "world:". 当然，“ world：”和“ MY_NUMBER”是html代码的一部分，但是我想在第一次出现“ world：”之前忽略所有内容。 What I need is the first number that appears after the first occurrence of "world:", keeping in mind that a bunch of html code will be between those. 我需要的是第一个出现在“ world：”之后的第一个数字，请记住，在它们之间会有一堆html代码。 I could substring the html code but I would like to do this all just by using a single regex if possible. 我可以将html代码子字符串化，但是如果可能的话，我只想通过使用一个正则表达式来完成所有这些工作。

This is the regular expression I tried to match: 这是我尝试匹配的正则表达式：

'/(?<=world:)\D+?[0-9]+/'

But this returns me all the html stuff between "world:" and my number. 但这会返回“ world：”和我的电话号码之间的所有html内容。

Thanks! 谢谢！

Answer 1

I think you were close to getting it. 我认为您快要掌握了。 I was able to use this on the string you provided. 我能够在您提供的字符串上使用它。

$subject = "[HTML CODE ...]world:[HTML CODE ...]3334[REST OF HTML_CODE ...]";
$pattern = "/world:\D+?(?<my_number>[0-9]+)/";
$matches = array();

$result =  preg_match_all($pattern, $subject, &$matches);

print_r($matches);

Results in: 结果是：

Array
(
    [0] => Array
        (
            [0] => world:[HTML CODE ...]3334
        )

    [my_number] => Array
        (
            [0] => 3334
        )

    [1] => Array
        (
            [0] => 3334
        )

)

PHP Regexp：忽略定义的子字符串之前的所有内容

问题描述

1 个解决方案

解决方案1
0 已采纳 2011-11-07 05:22:15

PHP Regexp：忽略定义的子字符串之前的所有内容

问题描述

1 个解决方案

解决方案1 0 已采纳 2011-11-07 05:22:15

解决方案1
0 已采纳 2011-11-07 05:22:15