I am learning to deal with DOMXpath
in php
. I was using regex
(but I was discouraged here in the stack when for html capture). I confess that for me it is not so simple and the DOM
has its limits (when there are spaces in tag names and also in error handling). If someone can help me with the command in php to get a preview of the captured elements and check if everything is right, I would appreciate it. If you have suggestions for improving the code, you're welcome to do so.The code below was based on a question in Stackoverflow itself.
<?php
$doc = new DOMDocument;
libxml_use_internal_errors(true);
// Deleting whitespace (if any)
$doc->preserveWhiteSpace = false;
@$doc->loadHTML(file_get_contents ('http://www.imdb.com/search/title?certificates=us:pg_13&genres=comedy&groups=top_250'));
$xpath = new DOMXPath($doc);
// Starting from the root element
$grupos = $xpath->query(".//*[@class='lister-item mode-advanced']");
// Creating an array and then looping with the elements to be captured (image, title, and link)
$resultados = array();
foreach($grupos as $grupo) {
$i = $xpath->query(".//*[@class='loadlate']//@src", $grupo);
$t = $xpath->query(".//*[@class='lister-item-header']//a/text()", $grupo);
$l = $xpath->query(".//*[@class='lister-item-header']//a/@href", $grupo);
$resultados[] = $resultado;
}
// What command should I use to have a preview of the results and check if everything is ok?
print_r($resultados);
OK, so here your code with two corrections. First I'm adding a subarray to $resultados with the elements, and seconds I'm making a foreach instead of print_r/var_dump
BTW, doesn't imdb offer an API?
<?php
ini_set('display_errors', 1);
error_reporting(-1);
$doc = new DOMDocument;
libxml_use_internal_errors(true);
// Deleting whitespace (if any)
$doc->preserveWhiteSpace = false;
$doc->loadHTML(file_get_contents ('http://www.imdb.com/search/title?certificates=us:pg_13&genres=comedy&groups=top_250'));
//$doc->loadHTML($HTML);
$xpath = new DOMXPath($doc);
// Starting from the root element
$grupos = $xpath->query(".//*[@class='lister-item mode-advanced']");
// Creating an array and then looping with the elements to be captured (image, title, and link)
$resultados = array();
foreach($grupos as $grupo) {
$i = $xpath->query(".//*[@class='loadlate']//@src", $grupo);
$t = $xpath->query(".//*[@class='lister-item-header']//a/text()", $grupo);
$l = $xpath->query(".//*[@class='lister-item-header']//a/@href", $grupo);
$resultados[] = ['i' => $i[0], 't' => $t[0], 'l' => $l[0]];
}
// What command should I use to have a preview of the results and check if everything is ok?
//var_dump($resultados);
foreach($resultados as $r){
echo "\n-----------\n";
echo $r['i']->value."\n";
echo $r['t']->textContent."\n";
echo $r['l']->value."\n";
}
You can play with it here: https://3v4l.org/hal0G
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.