简体   繁体   中英

Limit search scope of XPath in Nokogiri

I would like to find specific tags within a Node which is in a NodeSet but when I used XPath it returns results from the whole NodeSet.

I'm trying to get something like:

{ "head1" => "Volume 1", "head2" => "Volume 2" }

from this HTML:

<h2 class="header">
  <a class="header" >head1</a>
</h2>
<table class="volume_description_header" cellspacing="0">
  <tbody>
    <tr>
      <td class="left">Volume 1</td>
    </tr>
  </tbody>
</table>
<h2 class="header">
  <a class="header" >head2</a>
</h2>
<table class="volume_description_header" cellspacing="0">
  <tbody>
    <tr>
      <td class="left">Volume 2</td>
    </tr>
  </tbody>
</table>

So far I've tried:

require 'nokogiri'
a = File.open("code-above.html") { |f| Nokogiri::HTML(f) }
h = a.xpath('//h2[@class="header"]')
puts h.map { |e| e.next.next }[0].xpath('//td[@class="left"]')

But with this I get:

<td class="left ">Volume 1</td>
<td class="left ">Volume 2</td>

I'm expecting only the first one.

I've tried doing the XPath inside the block but this gives me the the same result twice.

I checked and

puts h.map { |e| e.next.next }[0]

evaluates to the first Node so I don't understand why XPath looks in the whole NodeSet or even the whole Nokogiri::Document, as I think that's what it actually does.

Can somebody please explain me the principles of searching and navigating within a selected Node/NodeSet, not the whole Document? Maybe navigating down a known path would be better in this case but I don't know how to do that either.

Your second XPath expression, //td[@class="left"] , starts with // . This means to start at the root of the entire document when matching nodes. What you want is to start from the current node. To do that start your expression with a dot .// :

d.xpath('.//td[@class="left"]')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM