简体   繁体   中英

Ruby + Nokogiri + Xpath navigate Node_Set

<Item id="item0">
  <Links>
    <FirstLink id="link1" target="one"/>
    <SecondLink id="link2" target="two"/>
  </Links>
  <Data>
    <String>content</String>
  </Data>
</Item>
<Item id="item1">
  <Links>
    <FirstLink id="link1" target="two"/>
    <SecondLink id="link2" target="two"/>
  </Links>
  <Data>
    <String>content</String>
  </Data>
</Item>

I have created a Nokogiri-NodeSet with this structure, ie a list of items with links and data children. How can I filter any items that don't match a certain value in the 'target'-attribute of <FirstLink> ?

Actually, what I want in the end is to extract the <Data><String> -Content of every <Item> that matches a certain value in it's <FirstLink> "Target"-Attribute.

I've tried several approaches already but I'm at a loss as to how to identify an element by an attribute of it's grandchild, then extracting the content of this grandchild's parent's sibling, X(.

I completely didn't understand what your goal is. But using a guess, I am trying to show you, how to proceed in this case :

require 'nokogiri'

doc = Nokogiri::XML <<-xml
<Item id="item0">
  <Links>
    <FirstLink id="link1" target="one"/>
    <SecondLink id="link2" target="two"/>
  </Links>
  <Data>
    <String>content1</String>
  </Data>
</Item>
<Item id="item1">
  <Links>
    <FirstLink id="link1" target="two"/>
    <SecondLink id="link2" target="two"/>
  </Links>
  <Data>
    <String>content2</String>
  </Data>
</Item>
xml

#xpath method with the expression "//Item" , will select all the Item nodes. Then those Item nodes will be passed to the #reject method to select only those nodes, that has a node called Links having the target attribute value is "one" . If any of the links, either FirstLink or SecondLink has the target attribute value "one" , for that nodes grandparent node Item will be selected.

node.at("//Links/FirstLink")['target'] will give you the string say "one" which is a value of target attribute of the node, FirstLink of first Item nodes , then "two" from the second Item node. The part ['any vaue'] in node.at("//Links/FirstLink")['target']['any vaue'] is a call to the String#[] method.

Remember below approach will give you the flexibility of the use regular expression too.

nodeset = doc.xpath("//Item").reject do |node|
  node.at("//Links/FirstLink")['target']['any vaue']
end

Now nodeset contains only the required Item nodes. Now I use #map , passing each item node inside it to collect the content of the String node. Then #at method with an expression //Data/String , will select the String node. Then #text , will give you the content of each String node.

nodeset.map { |n| n.at('//Data/String').text } # => ["content1"]

We can build up an XPath expression to do this. Assuming we are starting from the whole XML document, rather than the node-set you already have, something like

//Item

will select all <Item> elements (I'm guessing you already have something like that to get this node-set).

Next, to select only those <Item> elements which have <Links><FirstLink> where FirstLink has a target attribute value of one :

//Item[Links/FirstLink[@target='one']]

and finally to select the Data/String children of those nodes:

//Item[Links/FirstLink[@target='one']]/Data/String

So with Nokogiri you could use something like this (where doc is your parsed document):

doc.xpath("//Item[Links/FirstLink[@target='one']]/Data/String")

or if you want to use the node-set you already have you can use a relative expression:

nodeset.xpath("self::Item[Links/FirstLink[@target='one']]/Data/String")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM