Nokogiri: node_set.rb:239: [BUG] Segmentation fault

Question

I am currently crawling some RSS feeds and collecting items into a nodeset. This works great, however Nokogiri crashes with some items. At first I thought something was wrong with my OSX development environment, so installed a Debian server and got the exact same error. Also tried downgrading Ruby from 1.9.3 to 1.9.2.

Any suggestions?

Some of the code:

doc.xpath("//item").remove
nodeset = doc.xpath("//item")
..
api_doc.xpath("//item").each do |node|
  node = check_score(node)
  unless node.nil?
    nodeset << node
  end
end

def check_score(node)
  if node.xpath('website:attr[@name="imdbscore"]/@value').text.to_i > 6
    return node
  end
end

# sorting and finally add nodeset to doc.

Crash log here..

Answer 1

I think it's bad practice to remove all the //item nodes, then try to find them. Right there I can see trouble brewing.

This deletes all <item> nodes from the document:

doc.xpath("//item").remove

This tries to find all <item> nodes, which will return an empty NodeSet:

nodeset = doc.xpath("//item")

You don't show where api_doc comes from, but if it's a Node that came from doc , especially from before you removed the nodes, its state is suspicious because you might have some dangling references to removed <item> nodes. As is, this tries to loop over all <item> nodes, which might not exist, so an empty NodeSet could be returned, or worse, could be damaged:

api_doc.xpath("//item").each do |node|
  node = check_score(node)
  unless node.nil?
    nodeset << node
  end
end

I'd check the revisions for your Nokogiri and LibXML2 and make sure they're current. If not, update them. I'd also rethink the logic of removing all the <item> nodes before you look for them.

Perhaps we could help you better if you explained what you're trying to do, and shared a small example of the XML.

Nokogiri: node_set.rb:239: [BUG] Segmentation fault

Question

1 answers

solution1
1 ACCPTED 2012-12-04 15:25:23

Nokogiri: node_set.rb:239: [BUG] Segmentation fault

Question

1 answers

solution1 1 ACCPTED 2012-12-04 15:25:23

solution1
1 ACCPTED 2012-12-04 15:25:23