简体   繁体   中英

CURL DOMXPath different values

I have this curl function that extracts the html from a website.

function curl($url){
    $headers[]  = "User-Agent:Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13";
    $headers[]  = "Accept:text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
    $headers[]  = "Accept-Language:en-us,en;q=0.5";
    $headers[]  = "Accept-Encoding:gzip,deflate";
    $headers[]  = "Accept-Charset:ISO-8859-1,utf-8;q=0.7,*;q=0.7";
    $headers[]  = "Keep-Alive:115";
    $headers[]  = "Connection:keep-alive";
    $headers[]  = "Cache-Control:max-age=0";

    $curl = curl_init();
    curl_setopt($curl, CURLOPT_URL, $url);
    curl_setopt($curl, CURLOPT_HTTPHEADER, $headers);
    curl_setopt($curl, CURLOPT_ENCODING, "");
    curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($curl, CURLOPT_FOLLOWLOCATION, 1);
    $data = curl_exec($curl);
    curl_close($curl);
    return $data;
}

And in order to retrieve the data I use :

$html = curl($USE_URL);
$doc = new DOMDocument();
$doc->loadHTML($html);
$data = new DOMXPath($doc);




$date_list= $data->query('............');
$name_list= $data->query('............');

echo $date_list->length;
echo $name_list->length;

If I run this code in the ' localhost ' it works smoothly (giving me length 52,52 ): . but if I use the exactly same code on my altervista website it gives me that the date_list length is zero! (length 0,52 )

The date_list format that I extract is a string like "08-09-2018 12:47"

Is there something wrong the the curl $headers maybe?

Weirdly I solved this by changing the query: I had to work my way around it: so basically my new query extracts a larget set of data (eg "abcd deddeh dede 12:30 dhhh") and manually extract my real data by manipulating the string (using split methods..)

$date_list= $data->query('.....HERE......');

I think that when CURL downloads the page it doesn't let it completely unvaried.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM