Parsing an XML file with Nokogiri to determine the path (Ruby)

Question

My code is supposed to "guess" the path(s) that lies before the relevant text nodes in my XML file. Relevant in this case means: text nodes nested within the recurring product/person/something tag, but not text nodes that are used outside of it.

This code:

    @doc, items = Nokogiri.XML(@file), []

    path = []
    @doc.traverse do |node|
      if node.class.to_s == "Nokogiri::XML::Element"
        is_path_element = false
        node.children.each do |child|
          is_path_element = true if child.class.to_s == "Nokogiri::XML::Element"
        end
        path.push(node.name) if is_path_element == true && !path.include?(node.name)
      end
    end
    final_path = "/"+path.reverse.join("/")

works for simple XML files, for example:

<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
  <channel>
    <title>Some XML file title</title>
    <description>Some XML file description</description>
    <item>
      <title>Some product title</title>
      <brand>Some product brand</brand>
    </item>
    <item>
      <title>Some product title</title>
      <brand>Some product brand</brand>
    </item>
  </channel>
</rss>

puts final_path # => "/rss/channel/item"

But when it gets more complicated, how should I then approach the challenge? For example with this one:

<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
  <channel>
    <title>Some XML file title</title>
    <description>Some XML file description</description>
    <item>
      <titles>
        <title>Some product title</title>
      </titles>
      <brands>
        <brand>Some product brand</brand>
      </brands>
    </item>
    <item>
      <titles>
        <title>Some product title</title>
      </titles>
      <brands>
        <brand>Some product brand</brand>
      </brands>
    </item>
  </channel>
</rss>

Answer 1

If you are looking for a list of deepest "parent" paths in the XML, there is more than one way to view that.

Although I think your own code could be adjusted to achieve the same output, I was convinced the same thing could be achieved by using xpath. And my motivation is to get my XML skills unrusty (not used Nokogiri yet, but I will need to do so professionally soon). So here is how to get all parent paths that have just one child level beneath them, using xpath:

xml.xpath('//*[child::* and not(child::*/*)]').each { |node| puts node.path }

The output of this for your second example file is:

/rss/channel/item[1]/titles
/rss/channel/item[1]/brands
/rss/channel/item[2]/titles
/rss/channel/item[2]/brands

. . . if you took this list and gsub out the indexes, then make the array unique, then this looks a lot like the output of your loop . . .

paths = xml.xpath('//*[child::* and not(child::*/*)]').map { |node| node.path }
paths.map! { |path| path.gsub(/\[[0-9]+\]/,'') }.uniq!
=> ["/rss/channel/item/titles", "/rss/channel/item/brands"]

Or in one line:

paths = xml.xpath('//*[* and not(*/*)]').map { |node| node.path.gsub(/\[[0-9]+\]/,'') }.uniq
=> ["/rss/channel/item/titles", "/rss/channel/item/brands"]

Answer 2

Got a library for building xpath.

xpath = Jini.new
        .add_path('parent')
        .add_path('child')
        .add_all('toys')
        .add_attr('name', 'plane')
        .to_s
puts xpath // -> /parent/child//toys[@name="plane]

Parsing an XML file with Nokogiri to determine the path (Ruby)

Question

2 answers

solution1
3 ACCPTED 2013-03-28 21:42:38

solution2
0 2022-09-15 17:58:35

Parsing an XML file with Nokogiri to determine the path (Ruby)

Question

2 answers

solution1 3 ACCPTED 2013-03-28 21:42:38

solution2 0 2022-09-15 17:58:35

solution1
3 ACCPTED 2013-03-28 21:42:38

solution2
0 2022-09-15 17:58:35