简体   繁体   中英

(Rails) Can't get Mechanize to correctly read a web xml file

I have to read xml files that are accessibles through http with authentication. That's why I use mechanize.

My problem is that I can't get mechanize to recognize these XML files so I can use .find or .search on them.

Here is what I tried first - in my view (html file)

<% agent = Mechanize.new %>

<% page = agent.get("http://dl.dropbox.com/u/344349/xml.xml") %>

<%= page %>

Which returns #<Mechanize::File:0x007f9dd602de30> . It's ::File and not ::Page I can't use a .find or .search on this as it'll error with undefined method find for #<Mechanize::File:0x007f9dd624cbd0>

Mechanize doc says : This is the default (and base) class for the Pluggable Parsers. If Mechanize cannot find an appropriate class to use for the content type, this class will be used. For example, if you download a JPG, Mechanize will not know how to parse it, so this class will be instantiated.

So I created a class as described here : http://rdoc.info/github/tenderlove/mechanize/master/Mechanize/PluggableParser

My class

class XMLParser < Mechanize::File

attr_reader :xml

def initialize(uri=nil, response=nil, body=nil, code=nil)

super(uri, response, body, code)

@xml = xml.parse(body)

end

end

and the updated code in my view (html file)

<% agent = Mechanize.new %>

<% agent.pluggable_parser['text/xml'] = XMLParser %>

<% agent.user_agent_alias = 'Windows Mozilla' %>

<% page = agent.get("http://dl.dropbox.com/u/344349/xml.xml") %>

<%= page %>

or even

<% agent = Mechanize.new %>

<% agent.pluggable_parser.xml = XMLParser %>

<% page1 = agent.get('http://dl.dropbox.com/u/344349/xml.xml') # => CSVParser %>

<%= page1 %>

Still returns #<Mechanize::File:0x007f9dd5253b48>

I even tested the exact code (CSVParser - http://rdoc.info/github/tenderlove/mechanize/master/Mechanize/PluggableParser ) and tried loading a csv file that is still seen as a ::File.

What am I doing wrong ?

Okay, so I've resolved this problem for myself just now. The solution is in two parts:

First, the content type you are matching is incorrect. If you run this line, after you do your get, it will tell you what the content type is for the document you are getting:

page.response['content-type'] # => 'application/xml', not 'text/xml'

When I use mechanize to get your page ('http://dl.dropbox.com/u/344349/xml.xml'), I see 'application/xml' as the content type.

Second, you're not using PluggableParser correctly. Using XMLParser as you have it here will generate NoMethodError: undefined method 'parse' for nil:NilClass . Change the class definition to use Nokogiri::XML instead:

class XmlParser < Mechanize::File
  attr_reader :xml
  def initialize(uri = nil, response = nil, body = nil, code = nil)
    @xml = Nokogiri::XML(body)
    super uri, response, body, code
  end
end

Then, set this as the parser for the correct content type:

mech.pluggable_parser['application/xml'] = XmlParser

To use this, you'll get your page the same as before, and then reference the xml attribute of the page object as a Nokogiri::XML::Document instance, which is a subclass of Nokogiri::XML::Node . Fortunately, Mechanize::Page.search is just a wrapper around Nokogiri::XML::Node.search , so you can search the same way you expect, pretty much. Like this:

page.xml.search 'catalog'

A further refinement would be to map XmlParser.search to the Nokogiri .search methods:

# This is the same as what Mechanize::Page does
class XmlParser < Mechanize::File
  extend Forwardable
  def_delegators  :@xml, :search, :/, :at
end

This lets you perform your searches directly on the page instance:

page.search 'catalog'

You can change the parser to use Page class, like it:

agent = Mechanize.new
agent.pluggable_parser.default = Mechanize::Page
agent.get("http://dl.dropbox.com/u/344349/xml.xml").class # Mechanize::Page

See http://mechanize.rubyforge.org/Mechanize/PluggableParser.html

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM