使用xpath的ruby nokogiri HTML表格刮擦

Question

我正在嘗試使用ruby xpath和nokogiri獲取寫在HTML表中的“cell4”值，如下所示：

<html>
<body>

<h1>Heading</h1>

<p>paragraph.</p>

<h4>Two rows and three columns:</h4>
<table border="0">
<tr>
  <td>cell1</td>
  <td>cell2</td>
</tr>
<tr>
  <td>cell3</td>
  <td>cell4</td>
</tr>

</table>

</body>
</html>

我有以下簡單的代碼，但它帶來了[]。 這個問題必須足夠簡單，但無法找到任何可以在谷歌上點擊的地方

#!/usr/bin/ruby -w

require 'rubygems'
require 'nokogiri'

page1 = Nokogiri::HTML('test_simple.html')

a = page1.xpath("//html/body/table/tr[2]/td[2]")
p a

xpath在REXML上按預期工作，因此它是正確的，但不在nokogiri上。 由於這將用於更大的htmls，因此無法使用REXML。 問題似乎不僅僅是表中的其他標記內容

或者也不能被刮掉。

Answer 1

恕我直言，使用Nokogiri中的CSS API非常簡單（XPath總是令我頭疼）：

page.css('td') # should return an array of 4 table cell nodes
page.css('td')[3] # return the 4th 'td' node, counting starts at 0

Answer 2

感謝taro的評論，我能夠通過一些努力來解決這個問題

這是正確的代碼：

#!/usr/bin/ruby -w
require 'rubygems'
require 'nokogiri'
page1 = Nokogiri::HTML(open('test_simple.html'))
a = page1.xpath("/html/body/table/tr[2]/td[2]").text
p a

使用xpath的ruby nokogiri HTML表格刮擦

問題描述

2 個解決方案

解決方案1
7 2011-11-07 14:50:44

解決方案2
4 已采納 2011-11-10 08:11:08

使用xpath的ruby nokogiri HTML表格刮擦

問題描述

2 個解決方案

解決方案1 7 2011-11-07 14:50:44

解決方案2 4 已采納 2011-11-10 08:11:08

解決方案1
7 2011-11-07 14:50:44

解決方案2
4 已采納 2011-11-10 08:11:08