简体   繁体   中英

php cURL. preg_match , extract text from xhtml

I'm trying to extract the price from the bellow html page/link using php cURL and preg_match . Basically I'm expecting for this code to output 4,550 but for some reasons I get

Notice: Undefined offset: 1 in C:\wamp\www\test.php on line 22

I think that the pattern is correct because if I put the html itself in a variable and escape the "" it works ! . Also if I output (echo $result;) it displays the html properly grabbed from foxtons website so I just can't figure it out why the whole thing doesn't work . I need to make this work and also I would appreciate if you would tell me why is that notice generated and why my current script doesn't work.

$url = "http://www.foxtons.co.uk/search?bedrooms_from=0&property_id=727717";
$ch = curl_init($url);

curl_setopt($ch, CURLOPT_HEADER, 0); curl_setopt($ch,CURLOPT_RETURNTRANSFER, 1); $result = curl_exec($ch); curl_exec($ch); curl_close($ch); $result2 = str_replace('"', '\"', $result);

$tagname1= ");</script> "; $tagname2= "</noscript> per month</a>";

$pattern = "/$tagname1(.*?)$tagname2/"; preg_match($pattern, $result, $matches); $prices = $matches[1]; print_r($prices); ?>

I rewrote the script a bit to account for more than 1 <noscript> on the page. You needed to use preg_match_all which will look for all the matches not just stop at the first one.



$url = "http://www.foxtons.co.uk/search?bedrooms_from=0&property_id=727717";
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch,CURLOPT_RETURNTRANSFER, 1);
$result = curl_exec($ch);
curl_exec($ch);
curl_close($ch);

preg_match_all("/<noscript>(.*)<\/noscript>/", $result, $matches);
print_r($matches);

Outputs



Array
(
    [0] => Array
        (
            [0] => £1,050
            [1] => 4,550
        )

    [1] => Array
        (
            [0] => £1,050
            [1] => 4,550
        )

)

I tried this on my box and it worked - let me know if it worked for you

Don't use REGEX to parse html , use an html dom parser instead, like PHP Simple HTML DOM Parser

include("simple_html_dom.php") ;

$html = file_get_html("http://www.foxtons.co.uk/search?bedrooms_from=0&property_id=727717");

foreach($html->find('noscript') as $noscript)
{

    echo $noscript->innertext."<br>";
} 

echo's:

£1,600
6,934
£1,500
6,500
£1,350
5,850
£950
4,117
£925
4,009
£850
3,684
£795
3,445
£795
3,445
£775
3,359
£750
3,250

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM