用DOMXPath刮刮表

Question

I have a table I'm trying to scrape that looks like this: 我有一张要刮的桌子，看起来像这样：

<table id="thisTable">
    <tr>
        <td class="value1"></td>
        <td class="value2"></td>
        <td class="value3"></td>
        <td class="value4"></td>
    </tr>
    <tr>
        <td class="value5"></td>
        <td class="value6"></td>
    </tr>
</table>

and my DOMXPath that looks like this (so far): 和我的DOMXPath看起来像这样（到目前为止）：

$htmlDoc = new DomDocument();
@$htmlDoc->loadhtml($html);
$xpath = new DOMXPath($htmlDoc);

$nodelist = $xpath->query('//*[@id="thisTable"]');

foreach ($nodelist as $n){
    echo $n->nodeValue."\n";
}

This works, I get the values of the table, but how do I specify the class of a nodeValue? 这可行，我得到表的值，但是如何指定nodeValue的类呢？ Ultimately, my goal is to build a new table from the td 's content of value2 , value4 and value5 in a single row. 最终，我的目标是在一行中根据td的value2 ， value4和value5的内容构建一个新表。

Answer 1

$htmlDoc = new DomDocument();
$htmlDoc->loadHTML($html);
$xpath = new DOMXPath($htmlDoc);

$nodelist = $xpath->query('//td');

foreach ($nodelist as $n){
    echo $n->getAttribute("class")."\n";
}

Note : Use getAttribute property for getting values of class 注意：使用getAttribute属性获取类的值

Answer 2

Expand your xpath-query: 展开xpath-query：

$class="value1";
$nodelist = $xpath->query('//*[@id="thisTable"][@class="$class"]');

Answer 3

Not sure if I understand correctly, if you want the text contents of value2, value4 and value5 in a single row, you can use this xpath: 不知道我是否理解正确，如果要将value2，value4和value5的文本内容放在一行中，可以使用以下xpath：

(//td[@class='value2'] | //td[@class='value4'] | //td[@class='value5'])/text()

For example: 例如：

<table id="thisTable"> 
  <tr> 
    <td class="value1">  1111</td>
    <td class="value2"> 222 </td>
    <td class="value3">333 </td> 
    <td class="value4"> 444</td>
  </tr>  
  <tr> 
    <td class="value5">  555</td>
    <td class="value6"> 666</td>
  </tr> 
</table>

output will then be: 222 444 555 输出将是：222444555

用DOMXPath刮刮表

问题描述

3 个解决方案

解决方案1
1 已采纳 2013-03-03 14:55:25

解决方案2
0 2013-03-03 14:50:49

解决方案3
0 2013-04-17 03:49:23

用DOMXPath刮刮表

问题描述

3 个解决方案

解决方案1 1 已采纳 2013-03-03 14:55:25

解决方案2 0 2013-03-03 14:50:49

解决方案3 0 2013-04-17 03:49:23

解决方案1
1 已采纳 2013-03-03 14:55:25

解决方案2
0 2013-03-03 14:50:49

解决方案3
0 2013-04-17 03:49:23