简体   繁体   中英

XPath not returning results on Google App Engine for PHP

I'm having an issue using XPath on the Google App Engine for PHP.

So I have the following code:

function getDataXpath($url_str, $xpath_exp_str)
{
    $doc = new DOMDocument();
    libxml_use_internal_errors(true);
    $doc->loadHTMLFile($url_str);
    libxml_use_internal_errors(false);

    $xpath = new DOMXpath($doc);
    $elements = $xpath->query("".$xpath_exp_str."");

    if (!is_null($elements)) {
        return $elements;
    }

    return false;
}

And then I simply run it like this to get the nodes:

getDataXpath($url_str, $xpath_exp_str);

So on my local PHP install (v 5.5.19), when I run the following:

$url_str = 'http://www.alexa.com/topsites/category;0/Top/Shopping';
$xpath_exp_str = "//ul/li[@class='site-listing']/div/p/a";
$xpath_data = getDataXpath($url_str, $xpath_exp_str);
print_r($xpath_data);

I get the following result:

DOMNodeList Object ( [length] => 25 ); 

and this is correct.

However, when I run the same code on Google App Engine for PHP (v 5.5.26), I get the following:

DOMNodeList Object ( [length] => 0 ); 

Has anyone had this issue, and how did you fix it?

So it appears that Amazon might be blocking programmatic access to the Alexa TopSites pages. I'm actually subscribed to their new API , but it doesn't allow you to categorize responses (eg top e-commerce sites) like you can on the website, which is why I'm resorting to XPath.

I tried the same script on some other URLs and I didn't have any issues.

Anyway, it works when I run it locally (in browser and command-line), so I'll just have to skip Google App Engine for now. It's a broken workflow, especially since this was part of a much bigger automation effort, but it's out of my hands at this point.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM