[英]Trouble scraping table with DOMXPath
I have a table I'm trying to scrape that looks like this: 我有一张要刮的桌子,看起来像这样:
<table id="thisTable">
<tr>
<td class="value1"></td>
<td class="value2"></td>
<td class="value3"></td>
<td class="value4"></td>
</tr>
<tr>
<td class="value5"></td>
<td class="value6"></td>
</tr>
</table>
and my DOMXPath that looks like this (so far): 和我的DOMXPath看起来像这样(到目前为止):
$htmlDoc = new DomDocument();
@$htmlDoc->loadhtml($html);
$xpath = new DOMXPath($htmlDoc);
$nodelist = $xpath->query('//*[@id="thisTable"]');
foreach ($nodelist as $n){
echo $n->nodeValue."\n";
}
This works, I get the values of the table, but how do I specify the class of a nodeValue? 这可行,我得到表的值,但是如何指定nodeValue的类呢? Ultimately, my goal is to build a new table from the
td
's content of value2
, value4
and value5
in a single row. 最终,我的目标是在一行中根据
td
的value2
, value4
和value5
的内容构建一个新表。
$htmlDoc = new DomDocument();
$htmlDoc->loadHTML($html);
$xpath = new DOMXPath($htmlDoc);
$nodelist = $xpath->query('//td');
foreach ($nodelist as $n){
echo $n->getAttribute("class")."\n";
}
Note : Use getAttribute property for getting values of class 注意 :使用getAttribute属性获取类的值
Expand your xpath-query: 展开xpath-query:
$class="value1";
$nodelist = $xpath->query('//*[@id="thisTable"][@class="$class"]');
Not sure if I understand correctly, if you want the text contents of value2, value4 and value5 in a single row, you can use this xpath: 不知道我是否理解正确,如果要将value2,value4和value5的文本内容放在一行中,可以使用以下xpath:
(//td[@class='value2'] | //td[@class='value4'] | //td[@class='value5'])/text()
For example: 例如:
<table id="thisTable">
<tr>
<td class="value1"> 1111</td>
<td class="value2"> 222 </td>
<td class="value3">333 </td>
<td class="value4"> 444</td>
</tr>
<tr>
<td class="value5"> 555</td>
<td class="value6"> 666</td>
</tr>
</table>
output will then be: 222 444 555 输出将是:222444555
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.