简体   繁体   中英

How to match all namespace's element by Nokogiri

I want to find a element like this.

xml1 = '<period>2017-02-10</period>'

or

xml2 = <<XML
<root xmlns:xbrli="http://www.w3.org/1999/xhtml">
  <xbrli:period>2017-02-10</period>
</root>
XML

I can select the element by:

  def period_from_xml(xml)
    doc = Nokogiri::XML(xml)
    period_element = if doc.namespaces.keys.include?('xmlns:xbrli')
      doc.at_css("xbrli|period")
    else
      doc.at_css("period")
    end
  end

  period_from_xml(xml1)
  # => <period>2017-02-10</period>
  period_from_xml(xml2)
  # => <xbrli:period>2017-02-10</period>

I know Nokogiri::XML::Document#remove_namespaces! , but I don't want to use it, because another place I need it.

Maybe duplicating the doc and doc_without_namespaces is good idea?

Is there a easy and simple way to handle this situation?

I'd use this:

require 'nokogiri'

xml = <<EOT
<root xmlns:xbrli="http://www.w3.org/1999/xhtml">
  <period>2017-02-10</period>
  <xbrli:period>2017-02-11</period>
</root>
EOT

doc = Nokogiri::XML(xml)

doc.search('period,xbrli|period').map(&:text) # => ["2017-02-10", "2017-02-11"]

'period,xbrli|period' in CSS means "find "period" or "xbrli:period" .

See " How to avoid joining all text from Nodes when scraping " also.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM