How to match all namespace's element by Nokogiri

Question

I want to find a element like this.

xml1 = '<period>2017-02-10</period>'

or

xml2 = <<XML
<root xmlns:xbrli="http://www.w3.org/1999/xhtml">
  <xbrli:period>2017-02-10</period>
</root>
XML

I can select the element by:

  def period_from_xml(xml)
    doc = Nokogiri::XML(xml)
    period_element = if doc.namespaces.keys.include?('xmlns:xbrli')
      doc.at_css("xbrli|period")
    else
      doc.at_css("period")
    end
  end

  period_from_xml(xml1)
  # => <period>2017-02-10</period>
  period_from_xml(xml2)
  # => <xbrli:period>2017-02-10</period>

I know Nokogiri::XML::Document#remove_namespaces! , but I don't want to use it, because another place I need it.

Maybe duplicating the doc and doc_without_namespaces is good idea?

Is there a easy and simple way to handle this situation?

Answer 1

I'd use this:

require 'nokogiri'

xml = <<EOT
<root xmlns:xbrli="http://www.w3.org/1999/xhtml">
  <period>2017-02-10</period>
  <xbrli:period>2017-02-11</period>
</root>
EOT

doc = Nokogiri::XML(xml)

doc.search('period,xbrli|period').map(&:text) # => ["2017-02-10", "2017-02-11"]

'period,xbrli|period' in CSS means "find "period" or "xbrli:period" .

See " How to avoid joining all text from Nodes when scraping " also.

How to match all namespace's element by Nokogiri

Question

1 answers

solution1
0 2017-02-13 22:24:43

How to match all namespace's element by Nokogiri

Question

1 answers

solution1 0 2017-02-13 22:24:43

solution1
0 2017-02-13 22:24:43