简体   繁体   中英

Can I get the result of domxpath nested classes into an array with keys => value?

I'm getting some data from a webpage for clients and that works fine, it gets all data in seperate rows by exploding the \n into new lines which I then map to specific array data to fill form fields with. Like so for each needed value:

$lines = explode("\n", $html);
$data['vraagprijs']         = preg_replace("/[^0-9]/", "", $lines[5]);

However, the data i need may be in Line 10 today, but might very well be line 11 tomorrow. So I'd like to get the values into named arrays. A sample of the HTML on the URL is as follows:

<div class="item_list">             
<span class="item first status">
    <span class="itemName">Status</span>                        
    <span class="itemValue">Sold</span>
</span>
<span class="item price">
    <span class="itemName">Vraagprijs</span>
    <span class="itemValue">389.000</span>
</span>
<span class="item condition">
    <span class="itemName">Aanvaarding</span>
    <span class="itemValue">In overleg</span>
</span>
...
</div>

This is my function model:

$tagName3   = 'div';
$attrName3  = 'class';
$attrValue3 = 'item_list';
$html       = getShortTags($tagName3, $attrName3, $attrValue3, $url); 

function getShortTags($tagName, $attrName, $attrValue, $url = "", $exclAttrValue = 'itemTitle') {

    $dom = $this->getDom($url);

    $html                 = '';
    $domxpath             = new \DOMXPath($dom);
    $newDom               = new \DOMDocument;
    $newDom->formatOutput = true;

    $filtered = $domxpath->query(" //" . $tagName . "[@" . $attrName . "='" . $attrValue . "']/descendant::text()[not(parent::span/@" . $attrName . "='" . $exclAttrValue . "')] ");
    $i        = 0;
    while ($myItem   = $filtered->item($i++)) {
        $node   = $newDom->importNode($myItem, true);
        $newDom->appendChild($node); 
    }
    $html = $newDom->saveHTML();
    return $html;
}

What am I getting?

Status\nSold\nVraagprijs\n389.000\nIn overleg\n....

Desired output anything like:

$html = array("Status" => "Sold", "Vraagprijs" => "389.000", "Aanvaarding" => "In overleg", ...)

Is there a way to "loop" through the itemList and get each itemName and itemValue into an associative array?

If your happy with what the getShortTags() method does (or if it's used elsewhere and so difficult to tweak), then you can process the return value.

This code first uses explode() to split the output by line, uses array_map() and trim() to remove any spaces etc., then passes the result through array_filter() to remove blank lines. This will leave the data in pairs, so an easy way is to use array_chunk() to extract the pairs and then foreach() over the pairs with the first as the key and the second as the value...

$html = getShortTags($tagName3, $attrName3, $attrValue3, $url);
$lines = array_filter(array_map("trim", explode(PHP_EOL, $html)));
$pairs = array_chunk($lines, 2);
$output = [];
foreach ( $pairs as $pair ) {
    $output[$pair[0]] = $pair[1];
}
print_r($output);

with the sample data gives..

Array
(
    [Status] => Sold
    [Vraagprijs] => 389.000
    [Aanvaarding] => In overleg
)

To use this directly in the document and without making any assumptions (although if you don't have a name for several values, then not sure what you will end up with). This just looks specifically for the base element and then loops over the <span> elements. Each time within this it will look for the itemName and itemValue class attributes and get the value from these...

$output = [];
$filtered = $domxpath->query("//div[@class='item_list']/span");
foreach ( $filtered as $myItem )  {
    $name= $domxpath->evaluate("string(descendant::span[@class='itemName'])", $myItem);
    $value= $domxpath->evaluate("string(descendant::span[@class='itemValue'])", $myItem);
    $output[$name] = $value;
}
print_r($output);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM