简体   繁体   English

HTML DOM文档解析

[英]HTML DOM Document parsing

i am new to DOM Document.. i have this html: 我是DOM Document的新手..我有这个HTML:

    <tr class="calendar_row" data-eventid="39657">
        <td class="alt1 eventDate smallfont" align="center">Sun<div class="eventday_multiple">Dec 9</div></td>
        <td class="alt1 smallfont" align="center">3:34am</td>
        <td class="alt1 smallfont" align="center">USD</td>
    </tr>

    <tr class="calendar_row" data-eventid="39658">
        <td class="alt1 eventDate smallfont" align="center">Sun<div class="eventday_multiple">Dec 10</div></td>
        <td class="alt1 smallfont" align="center">5:14am</td>
        <td class="alt1 smallfont" align="center">EUR</td>
    </tr>

i am trying to get first the contents inside the tr's using this code: 我想使用此代码首先获取tr内容:

    $ret = array();
    libxml_use_internal_errors(true); 
    $doc = new DOMDocument();
    $doc->loadHTML($html);
    //$doc->saveHTMLFile('textbox.php');

    $text = $doc->getElementsByTagName('tr');
    foreach ($text as $tag){
        $ret[] = $doc->saveHtml($tag); 
        echo $doc->saveHtml($tag); 
    }

i dont know why the value being echoed was the whole document and not the values inside the tr's.. 我不知道为什么被回应的价值是整个文件,而不是tr里面的价值。

second, i would like also to get the values in between those td tags like 5:14 AM,EUR,etc. 第二,我还希望获得这些td标签之间的值,如5:14 AM,EUR等。 but i dont have any idea how to do that. 但我不知道该怎么做。

Pardon for noob question.. 原谅noob问题..

Best Regards 最好的祝福

$doc = new DOMDocument();
$doc ->loadHTML("$html");
$tables = $doc->getElementsByTagName('table');
$table = $tables->item(0);//takes the first table in dom

foreach ($table->childNodes as $td) {
  if ($td->nodeName == 'td') {
    echo $td->nodeValue, "\n";
  }
}

Passing an element to saveHtml generates the elements outerHTML not its innerHTML, so you get its tag attributes and all its content. 将元素传递给saveHtml生成元素outerHTML而不是其innerHTML,因此您可以获取其标记属性及其所有内容。 Of course you need to be running PHP>=5.3.6 . 当然你需要运行PHP> = 5.3.6。

The values between the td can be obtained by $td->firstChild->nodeValue; td之间的值可以通过$td->firstChild->nodeValue; or just $td->textContent; 或者只是$td->textContent; where $td is the <td> in question. 其中$td是有问题的<td>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM