php cURL. preg_match , extract text from xhtml

Question

I'm trying to extract the price from the bellow html page/link using php cURL and preg_match . Basically I'm expecting for this code to output 4,550 but for some reasons I get

Notice: Undefined offset: 1 in C:\wamp\www\test.php on line 22

I think that the pattern is correct because if I put the html itself in a variable and escape the "" it works ! . Also if I output (echo $result;) it displays the html properly grabbed from foxtons website so I just can't figure it out why the whole thing doesn't work . I need to make this work and also I would appreciate if you would tell me why is that notice generated and why my current script doesn't work.

$url = "http://www.foxtons.co.uk/search?bedrooms_from=0&property_id=727717";
$ch = curl_init($url);

curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch,CURLOPT_RETURNTRANSFER, 1); 
$result = curl_exec($ch);
curl_exec($ch);
curl_close($ch);
$result2 = str_replace('"', '\"', $result);

$tagname1= ");</script>
    ";
 $tagname2= "</noscript> 
    per month</a>";

$pattern = "/$tagname1(.*?)$tagname2/";
preg_match($pattern, $result, $matches);
$prices = $matches[1];

print_r($prices);

?>

Answer 1

I rewrote the script a bit to account for more than 1 <noscript> on the page. You needed to use preg_match_all which will look for all the matches not just stop at the first one.



$url = "http://www.foxtons.co.uk/search?bedrooms_from=0&property_id=727717";
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch,CURLOPT_RETURNTRANSFER, 1);
$result = curl_exec($ch);
curl_exec($ch);
curl_close($ch);

preg_match_all("/<noscript>(.*)<\/noscript>/", $result, $matches);
print_r($matches);

Outputs



Array
(
    [0] => Array
        (
            [0] => £1,050
            [1] => 4,550
        )

    [1] => Array
        (
            [0] => £1,050
            [1] => 4,550
        )

)

I tried this on my box and it worked - let me know if it worked for you

Answer 2

Don't use REGEX to parse html , use an html dom parser instead, like PHP Simple HTML DOM Parser

include("simple_html_dom.php") ;

$html = file_get_html("http://www.foxtons.co.uk/search?bedrooms_from=0&property_id=727717");

foreach($html->find('noscript') as $noscript)
{

    echo $noscript->innertext."<br>";
}

echo's:

php cURL. preg_match , extract text from xhtml

Question

2 answers

solution1
0 ACCPTED 2010-05-15 00:11:33

solution2
0 2011-08-09 17:03:19

php cURL. preg_match , extract text from xhtml

Question

2 answers

solution1 0 ACCPTED 2010-05-15 00:11:33

solution2 0 2011-08-09 17:03:19

solution1
0 ACCPTED 2010-05-15 00:11:33

solution2
0 2011-08-09 17:03:19