简体   繁体   中英

PHP DOM Find Elements Containing A Class

I'm trying to create and RSS feed of League of Legends news, since they don't have one... I'm trying to parse the HTML and find all elements containing a certain class attribute.

Here is what I have, but it's not finding anything.

<?php
    $page = file_get_contents("http://na.leagueoflegends.com/en/news/");
    $dom = new DomDocument();
    $dom->load($page);
    $finder = new DomXPath($dom);
    $classname="node-article";
    $nodes = $finder->query("//*[contains(concat(' ', normalize-space(@class), ' '), ' $classname ')]");
    echo "<pre>" . print_r($nodes, true) . "</pre>";
?>

Edit: Working code...

<?php
$page = file_get_contents("http://na.leagueoflegends.com/en/news/");
$dom = new DomDocument();
@$dom->loadHTML($page);
$finder = new DomXPath($dom);
$classname = "node-article";
$nodes = $finder->query("//*[contains(concat(' ', normalize-space(@class), ' '), ' $classname ')]");

$articles = array();
foreach ($nodes as $node) {
    $h4 = $node->getElementsByTagName('h4')->item(0);
    $articles[] = array(
        'title' => htmlentities($h4->firstChild->nodeValue),
        'content' => htmlentities($h4->nextSibling->nodeValue),
        'link' => 'http://na.leagueoflegends.com/en/news' . $h4->firstChild->getAttribute('href')
    );
}

echo "<pre>" . print_r($articles, true) . "</pre>";
?>

Actually you need loadHTML (Which reads string containing source) instead of load (Which basically accepts path to the document). Also you are using file_get_contents which reads entire file into a string. So you already have a string containing HTML Source.

Try this:

$page = file_get_contents("http://na.leagueoflegends.com/en/news/");
$dom = new DomDocument();
$dom->loadHTML($page);
$finder = new DomXPath($dom);
$classname = "node-article";
$nodes = $finder->query("//*[contains(concat(' ', normalize-space(@class), ' '), ' $classname ')]");
echo "<pre>" . print_r($nodes, true) . "</pre>";

// get title and content of article
$arr = array();

foreach ($nodes as $node) {
    $h4 = $node->getElementsByTagName('h4')->item(0);
    $arr[] = array(
        'title' => $h4->nodeValue,
        'content' => $h4->nextSibling->nodeValue,
    );
}

var_dump($arr); // your title & body content

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM