ruby nokogiri HTML table scraping using xpath

Question

I am trying to get "cell4" value that is written in a HTML table like the following using ruby xpath and nokogiri:

<html>
<body>

<h1>Heading</h1>

<p>paragraph.</p>

<h4>Two rows and three columns:</h4>
<table border="0">
<tr>
  <td>cell1</td>
  <td>cell2</td>
</tr>
<tr>
  <td>cell3</td>
  <td>cell4</td>
</tr>

</table>

</body>
</html>

I have the following simple code but it brings []. This question must be simple enough but couldnt find anything that hits the spot on the google

#!/usr/bin/ruby -w

require 'rubygems'
require 'nokogiri'

page1 = Nokogiri::HTML('test_simple.html')

a = page1.xpath("//html/body/table/tr[2]/td[2]")
p a

the xpath works as intended on REXML therefore it is correct, but does not on nokogiri. Since this is going to be used for larger htmls REXML cannot be used. The problem does not seem to be only with the tables only other tag contents

or cannot be scraped as well.

Answer 1

IMHO it is a lot asier to work with the CSS API in Nokogiri (XPath is always giving me headaches):

page.css('td') # should return an array of 4 table cell nodes
page.css('td')[3] # return the 4th 'td' node, counting starts at 0

Answer 2

thanks to taro`s comment, I was able to solve the issue with some little effort

Here goes the correct code:

#!/usr/bin/ruby -w
require 'rubygems'
require 'nokogiri'
page1 = Nokogiri::HTML(open('test_simple.html'))
a = page1.xpath("/html/body/table/tr[2]/td[2]").text
p a

ruby nokogiri HTML table scraping using xpath

Question

2 answers

solution1
7 2011-11-07 14:50:44

solution2
4 ACCPTED 2011-11-10 08:11:08

ruby nokogiri HTML table scraping using xpath

Question

2 answers

solution1 7 2011-11-07 14:50:44

solution2 4 ACCPTED 2011-11-10 08:11:08

solution1
7 2011-11-07 14:50:44

solution2
4 ACCPTED 2011-11-10 08:11:08