This is the current code that I have for scraping.
$item is the HTML for the div HTML within the loop.
$doc = DOMDocument::loadHTML($item);
$xpath = new DOMXPath($doc);
$link = "//a[@class='s-item__link']";
$entries = $xpath->query($link);
foreach ($entries as $entry) {
// do work here
}
I am changing the first two lines to be...
$doc = new DOMDocument();
$xpath = $doc->load($item);
With that, I am getting the following error...
Fatal error: Uncaught Error: Call to a member function query() on bool in
The error is coming in from $entries = $xpath->query($link);
and I can not figure out where to change this line to.
Any help would be appreciated.
UPDATE: same error
$doc = new DOMDocument();
$xpath = $doc->loadHTML($item);
$link = "//a[@class='s-item__link']";
$entries = $xpath->query($link);
foreach ($entries as $entry) {
// do work here
}
Look at the return value from DOMDocument:load()
...
Returns true on success or false on failure. If called statically , returns a DOMDocument or false on failure.
Emphasis: Mine. Notice that you're not calling it statically anymore with your change.
So, with code like, $xpath = $doc->load($item);
, of course $xpath
will need to be a bool (true or false), and your error makes total sense: Fatal error: Uncaught Error: Call to a member function query() on bool
.
I just scooped out the Xpath stuff I'm using right now for my own PHP scraper. This should work...
$dom = new DOMDocument;
@$dom->loadHTML(mb_convert_encoding($htmltext, 'HTML-ENTITIES', 'UTF-8'));
$xpath = new DOMXPath($dom);
Explanation:
new DOMDocument
: New class instance of DOMDocument()
. @$dom->loadHTML
: The @
symbol suppresses warnings, and this class is very wordy with its errors, you don't want to see them all the time. mb_convert_encoding($htmltext, 'HTML-ENTITIES', 'UTF-8')
: loadHTML()
appreciates properly UTF-8 encoded text, also, mb_convert_encoding()
is optimized for massive strings. new DOMXPath($dom);
: New class instance of DOMXPath()
. ->load
expects a filename
as first parameter as shown in the documentation .
In your first code block, you use loadHTML
.
Use ->loadHTML
instead off ->load
on an empty DomDocument
:
$doc = new DOMDocument();
$xpath = $doc->loadHTML($item);
public load ( string $filename , int $options = 0 ) : DOMDocument|bool
public loadHTML ( string $source , int $options = 0 ) : DOMDocument|bool
public loadHTMLFile ( string $filename , int $options = 0 ) : DOMDocument|bool
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.