简体   繁体   English

PHP-DOMelement select不返回选项值

[英]PHP - DOMelement select does not return option values

I've been trying to parse a website, through using DOMelements. 我一直在尝试通过使用DOMelements解析网站。 Everything was working properly, except from this issue which doesn't make sense to me. 一切正常,除了这个问题对我来说没有意义。

There is a select box, and I need the contents of all its possible option values: 有一个选择框,我需要所有可能的选项值的内容:

<select name="super_attribute[141]" id="attribute141" class="required-entry super-attribute-select">
    <option value="">Choose size</option>
    <option value="36" price="0">36</option>
    <option value="38" price="0">38</option>
    <option value="41" price="0">40</option>
    <option value="43" price="0">42</option>
    <option value="45" price="0">44</option>
    <option value="47" price="0">46</option>
    <option value="49" price="0">48</option>
</select>

I want to retrieve an array containing the values (either of innerHTML or 'value' attribute). 我想检索一个包含值的数组(innerHTML或'value'属性)。 I use this code: 我使用以下代码:

foreach ($dom->getElementsByTagName('option') as $option_tag) {
    $sizes_list[] = $option_tag->getAttribute('value');
}

However there is only always one 'option' tag returned, with an empty value. 但是,始终仅返回一个'option'标记,其值为空。 So I tried a different approach: 所以我尝试了另一种方法:

$item_options = $dom->getElementById('attribute141');
print(sizeof($item_options->childNodes)); // Prints "1"
foreach ($item_options->childNodes as $child) {
    $sizes_list[] = $child->getAttribute('value');
}
$cloth_item->setSizes($sizes_list);

And again it seems to find this single empty value ... Why cannot I access the rest of the options? 似乎又找到了一个空值...为什么我无法访问其余选项?

When you parse a HTML page from an URL, you must not ever refer to browser page inspector, because inspector shows source after DOM/js parsing. 从URL解析HTML页面时,绝对不要引用浏览器页面检查器,因为DOM / js解析后检查器会显示源。 You need to refer to “View page source” browser command, or — better — to do this in php: 您需要参考“查看页面源代码”浏览器命令,或者-更好-在php中做到这一点:

$html = file_get_contents( 'http://www.example.com/your/url.html' );
file_put_contents( '/Path/Local/Download/Page.html', $html );

Then, open downloaded file with a text editor to see the real HTML with which you are working. 然后,使用文本编辑器打开下载的文件,以查看正在使用的真实HTML

In your specific case, you can retrieve only one <option> because... there is only one <option> in loaded page: 在您的特定情况下,您只能检索一个<option>因为...加载的页面中只有一个<option>

<div class="input-box">
    <select name="super_attribute[141]" id="attribute141" class="required-entry super-attribute-select">
        <option>בחר אפשרות...</option>
    </select>
</div>

Other options are loaded by JavaScript. 其他选项由JavaScript加载。 Their values are stored in JSON format inside a script in the same page. 它们的值以JSON格式存储在同一页面的脚本中。 There is not a clean way to retrieve it. 没有干净的方法来检索它。 You can use PhantomJS , but — as you can see here or on other Stack Overflow questions — this way is not easy using php. 您可以使用PhantomJS ,但是-如您在此处或在其他Stack Overflow问题上所看到的-使用php并不容易。

A dirty way can be this: looking at HTML source, you can see that your data is in this format: 可能是这样一种肮脏的方式:查看HTML源代码,您可以看到您的数据采用以下格式:

<script type="text/javascript">
    var spConfig = new Product.Config({ (...) });
</script>

So, you can retrieve all <script> nodes and search for new Product.Config value. 因此,您可以检索所有<script>节点并搜索new Product.Config值。

With pure DOM: 使用纯DOM:

$nodes = $dom->getElementsByTagName('script');  // Result: 70 nodes

Using DOMXPath : 使用DOMXPath

$xpath = new DOMXPath( $dom );
$nodes = $xpath->query('//script[@type="text/javascript"]');  // Result: 58 nodes

Then, loop through all nodes, find for a regular expression pattern and decode it: 然后,遍历所有节点,查找正则表达式模式并对其进行解码:

foreach( $nodes as $node )
{
    if( preg_match( '~new Product\.Config\((.+?)\);~', $node->nodeValue, $matches ) )
    {
        $data = json_decode( $matches[1] );
        break;
    }
}

At this point, in $data you have this decoded JSON: 此时,在$data您具有以下解码的JSON:

stdClass Object
(
    [attributes] => stdClass Object
        (
            [141] => stdClass Object
                (
                    [id] => 141
                    [code] => size
                    [label] => מידה
                    [options] => Array
                        (
                            [0] => stdClass Object
                                (
                                    [id] => 36
                                    [label] => 36
                                    [price] => 0
                                    [oldPrice] => 0
                                    [products] => Array
                                        (
                                            [0] => 93548
                                        )
                                )
                            (...)
                        )
                )
        )
)

So to access to first <option> id, you can use this: 因此,要访问第一个<option> id,可以使用以下命令:

echo $data->attributes->{141}->options[0]->id; // Output: 36
#                       ↑ note curly brackets to access to a not-valid property key

And so on: 等等:

echo $data->attributes->{141}->options[1]->id;    // Output: 38
echo $data->attributes->{141}->options[1]->label; // Output: 38
echo $data->attributes->{141}->options[1]->price; // Output: 0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM