（滑轨）无法机械化以正确读取Web xml文件

Question

I have to read xml files that are accessibles through http with authentication. 我必须阅读可通过带有身份验证的http访问的xml文件。 That's why I use mechanize. 这就是为什么我使用机械化。

My problem is that I can't get mechanize to recognize these XML files so I can use .find or .search on them. 我的问题是我无法机械地识别这些XML文件，因此可以对它们使用.find或.search。

Here is what I tried first - in my view (html file) 这是我首先尝试的方法-在我看来（HTML文件）

<% agent = Mechanize.new %>

<% page = agent.get("http://dl.dropbox.com/u/344349/xml.xml") %>

<%= page %>

Which returns #<Mechanize::File:0x007f9dd602de30> . 返回#<Mechanize::File:0x007f9dd602de30> 。 It's ::File and not ::Page I can't use a .find or .search on this as it'll error with undefined method find for #<Mechanize::File:0x007f9dd624cbd0> 它是::File而不是::Page我不能对此使用.find或.search，因为它会undefined method find for #<Mechanize::File:0x007f9dd624cbd0>出错

Mechanize doc says : This is the default (and base) class for the Pluggable Parsers. Mechanize doc说：这是可插拔解析器的默认（和基本）类。 If Mechanize cannot find an appropriate class to use for the content type, this class will be used. 如果Mechanize无法找到适合于内容类型的类，则将使用该类。 For example, if you download a JPG, Mechanize will not know how to parse it, so this class will be instantiated. 例如，如果下载JPG，Mechanize将不知道如何解析它，因此将实例化此类。

So I created a class as described here : http://rdoc.info/github/tenderlove/mechanize/master/Mechanize/PluggableParser 所以我创建了一个如下所述的类： http : //rdoc.info/github/tenderlove/mechanize/master/Mechanize/PluggableParser

My class

class XMLParser < Mechanize::File

attr_reader :xml

def initialize(uri=nil, response=nil, body=nil, code=nil)

super(uri, response, body, code)

@xml = xml.parse(body)

end

and the updated code in my view (html file) 和我视图中的更新代码（html文件）

<% agent = Mechanize.new %>

<% agent.pluggable_parser['text/xml'] = XMLParser %>

<% agent.user_agent_alias = 'Windows Mozilla' %>

<% page = agent.get("http://dl.dropbox.com/u/344349/xml.xml") %>

<%= page %>

or even 甚至

<% agent = Mechanize.new %>

<% agent.pluggable_parser.xml = XMLParser %>

<% page1 = agent.get('http://dl.dropbox.com/u/344349/xml.xml') # => CSVParser %>

<%= page1 %>

Still returns #<Mechanize::File:0x007f9dd5253b48> 仍返回#<Mechanize::File:0x007f9dd5253b48>

I even tested the exact code (CSVParser - http://rdoc.info/github/tenderlove/mechanize/master/Mechanize/PluggableParser ) and tried loading a csv file that is still seen as a ::File. 我什至测试了确切的代码（CSVParser- http: //rdoc.info/github/tenderlove/mechanize/master/Mechanize/PluggableParser），并尝试加载仍被视为:: File的csv文件。

What am I doing wrong ? 我究竟做错了什么？

Answer 1

Okay, so I've resolved this problem for myself just now. 好的，现在我已经为自己解决了这个问题。 The solution is in two parts: 解决方案分为两部分：

First, the content type you are matching is incorrect. 首先，您要匹配的内容类型不正确。 If you run this line, after you do your get, it will tell you what the content type is for the document you are getting: 如果运行此行，获取后，它将告诉您所获取文档的内容类型是什么：

page.response['content-type'] # => 'application/xml', not 'text/xml'

When I use mechanize to get your page ('http://dl.dropbox.com/u/344349/xml.xml'), I see 'application/xml' as the content type. 当我使用机械化方法获取您的页面（“ http://dl.dropbox.com/u/344349/xml.xml”）时，我看到“ application / xml”作为内容类型。

Second, you're not using PluggableParser correctly. 其次，您没有正确使用PluggableParser。 Using XMLParser as you have it here will generate NoMethodError: undefined method 'parse' for nil:NilClass . 在这里使用XMLParser会生成NoMethodError: undefined method 'parse' for nil:NilClass 。 Change the class definition to use Nokogiri::XML instead: 将类定义更改为使用Nokogiri :: XML代替：

class XmlParser < Mechanize::File
  attr_reader :xml
  def initialize(uri = nil, response = nil, body = nil, code = nil)
    @xml = Nokogiri::XML(body)
    super uri, response, body, code
  end
end

Then, set this as the parser for the correct content type: 然后，将其设置为正确内容类型的解析器：

mech.pluggable_parser['application/xml'] = XmlParser

To use this, you'll get your page the same as before, and then reference the xml attribute of the page object as a Nokogiri::XML::Document instance, which is a subclass of Nokogiri::XML::Node . 要使用它，您将获得与以前相同的页面，然后将页面对象的xml属性作为Nokogiri :: XML :: Document实例引用，该实例是Nokogiri :: XML :: Node的子类。 Fortunately, Mechanize::Page.search is just a wrapper around Nokogiri::XML::Node.search , so you can search the same way you expect, pretty much. 幸运的是， Mechanize :: Page.search只是Nokogiri :: XML :: Node.search的包装，因此您几乎可以按照您期望的方式进行搜索。 Like this: 像这样：

page.xml.search 'catalog'

A further refinement would be to map XmlParser.search to the Nokogiri .search methods: 进一步的改进是将XmlParser.search映射到Nokogiri .search方法：

# This is the same as what Mechanize::Page does
class XmlParser < Mechanize::File
  extend Forwardable
  def_delegators  :@xml, :search, :/, :at
end

This lets you perform your searches directly on the page instance: 这使您可以直接在页面实例上执行搜索：

page.search 'catalog'

Answer 2

You can change the parser to use Page class, like it: 您可以将解析器更改为使用Page类，如下所示：

agent = Mechanize.new
agent.pluggable_parser.default = Mechanize::Page
agent.get("http://dl.dropbox.com/u/344349/xml.xml").class # Mechanize::Page

See http://mechanize.rubyforge.org/Mechanize/PluggableParser.html 参见http://mechanize.rubyforge.org/Mechanize/PluggableParser.html

（滑轨）无法机械化以正确读取Web xml文件

问题描述

2 个解决方案

解决方案1
2 已采纳 2012-02-21 20:21:35

解决方案2
0 2012-11-22 03:43:54

（滑轨）无法机械化以正确读取Web xml文件

问题描述

2 个解决方案

解决方案1 2 已采纳 2012-02-21 20:21:35

解决方案2 0 2012-11-22 03:43:54

解决方案1
2 已采纳 2012-02-21 20:21:35

解决方案2
0 2012-11-22 03:43:54