I am currently crawling some RSS feeds and collecting items into a nodeset. This works great, however Nokogiri crashes with some items. At first I thought something was wrong with my OSX development environment, so installed a Debian server and got the exact same error. Also tried downgrading Ruby from 1.9.3 to 1.9.2.
Any suggestions?
Some of the code:
doc.xpath("//item").remove
nodeset = doc.xpath("//item")
..
api_doc.xpath("//item").each do |node|
node = check_score(node)
unless node.nil?
nodeset << node
end
end
def check_score(node)
if node.xpath('website:attr[@name="imdbscore"]/@value').text.to_i > 6
return node
end
end
# sorting and finally add nodeset to doc.
I think it's bad practice to remove all the //item
nodes, then try to find them. Right there I can see trouble brewing.
This deletes all <item>
nodes from the document:
doc.xpath("//item").remove
This tries to find all <item>
nodes, which will return an empty NodeSet:
nodeset = doc.xpath("//item")
You don't show where api_doc
comes from, but if it's a Node that came from doc
, especially from before you removed the nodes, its state is suspicious because you might have some dangling references to removed <item>
nodes. As is, this tries to loop over all <item>
nodes, which might not exist, so an empty NodeSet could be returned, or worse, could be damaged:
api_doc.xpath("//item").each do |node|
node = check_score(node)
unless node.nil?
nodeset << node
end
end
I'd check the revisions for your Nokogiri and LibXML2 and make sure they're current. If not, update them. I'd also rethink the logic of removing all the <item>
nodes before you look for them.
Perhaps we could help you better if you explained what you're trying to do, and shared a small example of the XML.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.