I'm trying to extract the price from the bellow html page/link using php cURL and preg_match . Basically I'm expecting for this code to output 4,550 but for some reasons I get
Notice: Undefined offset: 1 in C:\wamp\www\test.php on line 22
I think that the pattern is correct because if I put the html itself in a variable and escape the "" it works ! . Also if I output (echo $result;) it displays the html properly grabbed from foxtons website so I just can't figure it out why the whole thing doesn't work . I need to make this work and also I would appreciate if you would tell me why is that notice generated and why my current script doesn't work.
$url = "http://www.foxtons.co.uk/search?bedrooms_from=0&property_id=727717"; $ch = curl_init($url);curl_setopt($ch, CURLOPT_HEADER, 0); curl_setopt($ch,CURLOPT_RETURNTRANSFER, 1); $result = curl_exec($ch); curl_exec($ch); curl_close($ch); $result2 = str_replace('"', '\"', $result);
$tagname1= ");</script> "; $tagname2= "</noscript> per month</a>";
$pattern = "/$tagname1(.*?)$tagname2/"; preg_match($pattern, $result, $matches); $prices = $matches[1]; print_r($prices); ?>
I rewrote the script a bit to account for more than 1 <noscript> on the page. You needed to use preg_match_all which will look for all the matches not just stop at the first one.
$url = "http://www.foxtons.co.uk/search?bedrooms_from=0&property_id=727717";
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch,CURLOPT_RETURNTRANSFER, 1);
$result = curl_exec($ch);
curl_exec($ch);
curl_close($ch);
preg_match_all("/<noscript>(.*)<\/noscript>/", $result, $matches);
print_r($matches);
Outputs
Array
(
[0] => Array
(
[0] => £1,050
[1] => 4,550
)
[1] => Array
(
[0] => £1,050
[1] => 4,550
)
)
I tried this on my box and it worked - let me know if it worked for you
Don't use REGEX to parse html , use an html dom parser instead, like PHP Simple HTML DOM Parser
include("simple_html_dom.php") ;
$html = file_get_html("http://www.foxtons.co.uk/search?bedrooms_from=0&property_id=727717");
foreach($html->find('noscript') as $noscript)
{
echo $noscript->innertext."<br>";
}
echo's:
£1,600
6,934
£1,500
6,500
£1,350
5,850
£950
4,117
£925
4,009
£850
3,684
£795
3,445
£795
3,445
£775
3,359
£750
3,250
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.