简体   繁体   中英

PHP Preg_Match_All Returning No Results

I am using Simple HTML Dom Parser to scrape a script tag from a webpage and then attempting to parse certain data from said tag using preg_match_all(). However, when I print preg_match_all, no results are returned.

Below is the code I'm using:

<head>
    <?php
        require_once "toolkit/http.php";
        require_once "toolkit/web_browser.php";
        require_once "toolkit/simple_html_dom.php";
    ?>
</head>
<body>
    <?php

        $prod_url = 'http://www.domain.com/subpage.html';
        $html = file_get_html($prod_url);
        $script = $html->find('script', 17);
        //echo $script;
        preg_match_all('(?<=\d":)\w++', $script, $matches);
        print_r($matches);

    ?>
</body>

I can see that the HTML Simple Dom code is working correctly, as I get the results I expect when echoing the $script variable. The results are:

<script type="text/javascript">
var PRODUCT_JSON = {
    "Def":{
        "default":202705111,
        "Listing:[{
            "label":"Includes",
            "options":[
                {label:"All", id: "884"},
                {label:"None", id: "485"},
            ]
        }],
        "Lookup":{
            "1":202705111,
            "0":202493236
        }
        }
};
</script>

So, the issue appears to be with the regex I'm using in preg_match_all(). The goal of the regex is to return the two numbers, 202705 and 202493, near the end of the script tag. It may have to do with escaping the double quote or parentheses, though I've also tried preg_match_all('\\(?<=\\d\\":\\)\\w++', $script, $matches); with the same result. Any ideas on what I'm doing wrong?

如果您忘记定界符,可以使用T-Regx ,它会自动添加定界符。

$matches = pattern('(?<=\d":)\w++')->match($script)->all();

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM