使用libxml + ruby的xpath搜索

Question

I am trying to search for a specific node in an XML file using XPath. 我正在尝试使用XPath在XML文件中搜索特定节点。 This search worked just fine under REXML but REXML was too slow for large XML docs. 该搜索在REXML下工作得很好，但是对于大型XML文档而言，REXML太慢了。 So moved over to LibXML . 因此移至LibXML 。

My simple example is processing a Yum repomd.xml file, an example can be found here: http://mirror.san.fastserv.com/pub/linux/centos/6/os/x86_64/repodata/repomd.xml 我的简单示例是处理Yum repomd.xml文件，可以在此处找到示例： http ://mirror.san.fastserv.com/pub/linux/centos/6/os/x86_64/repodata/repomd.xml

My test script is as follows: 我的测试脚本如下：

require 'rubygems'
require 'libxml'

p = LibXML::XML::Parser.file( "/tmp/dr.xml")
repomd = p.parse

filelist = repomd.find_first("/repomd/data[@type='filelists']/location@href")
puts "Length: " + filelist.length.to_s
filelist.each do |f|
   puts f.attributes['href']
end

I get this error: 我收到此错误：

Error: Invalid expression.
/usr/lib/ruby/gems/1.8/gems/libxml-ruby-2.7.0/lib/libxml/document.rb:123:in `find': Error: Invalid expression. (LibXML::XML::Error)
from /usr/lib/ruby/gems/1.8/gems/libxml-ruby-2.7.0/lib/libxml/document.rb:123:in `find'
from /usr/lib/ruby/gems/1.8/gems/libxml-ruby-2.7.0/lib/libxml/document.rb:130:in `find_first'
from /tmp/scripty.rb:6

I have also tried simpler examples like below, but still no dice. 我也尝试过以下更简单的示例，但仍然没有骰子。

p = LibXML::XML::Parser.file( "/tmp/dr.xml")
repomd = p.parse
filelist = repomd.root.find(".//location")
puts "Length: " + filelist.length.to_s

In the above case I get the output: 在上述情况下，我得到输出：

Length: 0

Your inspired guidance would be greatly appreciated, and I have searched for what I am doing wrong, and I just can't figure it out... 您的启发性指导将不胜感激，我已经搜索了我做错了什么，但我无法弄清楚...

Here is some code that will fetch the file and process it, still doesn't work... 这是一些将获取文件并对其进行处理的代码，仍然无法正常工作...

require 'rubygems'
require 'open-uri'
require 'libxml'

raw_xml = open('http://mirror.san.fastserv.com/pub/linux/centos/6/os/x86_64/repodata/repomd.xml').read
p = LibXML::XML::Parser.string(raw_xml)
repomd = p.parse
filelist = repomd.find_first("//data[@type='filelists']/location[@href]")
puts "First: " + filelist

Answer 1

In the end I reverted back to REXML and used stream processing. 最后，我回到了REXML并使用了流处理。 Much faster and much easier XPath syntax implementation. XPath语法实现更快，更容易。

Answer 2

Looking at your code,it seems you want to collect only those location elements which has href attribute. 查看您的代码，看来您只想收集具有href属性的那些location元素。 If that's the case below should work: 如果是这种情况，下面的方法应该可以工作：

"//data[@type='filelists']/location[@href]"

使用libxml + ruby的xpath搜索

问题描述

2 个解决方案

解决方案1
1 已采纳 2013-08-22 22:33:35

解决方案2
0 2013-08-20 10:09:32

使用libxml + ruby​​的xpath搜索

问题描述

2 个解决方案

解决方案1 1 已采纳 2013-08-22 22:33:35

解决方案2 0 2013-08-20 10:09:32

使用libxml + ruby的xpath搜索

解决方案1
1 已采纳 2013-08-22 22:33:35

解决方案2
0 2013-08-20 10:09:32