简体   繁体   English

如何使用 Ruby 和 Nokogiri 将 XML 节点解析为 CSV

[英]How to parse XML nodes to CSV with Ruby and Nokogiri

I have an XML file:我有一个 XML 文件:

?xml version="1.0" encoding="iso-8859-1"?>
<Offers xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://ssc.channeladvisor.com/files/cageneric.xsd">
  <Offer>
   <Model><![CDATA[11016001]]></Model>
   <Manufacturer><![CDATA[Crocs, Inc.]]></Manufacturer>
   <ManufacturerModel><![CDATA[11016-001]]></ManufacturerModel>
   ...lots more nodes
   <Custom6><![CDATA[<li>Bold midsole stripe for a sporty look.</li>
    <li>Odor-resistant, easy to clean, and quick to dry.</li>
    <li>Ventilation ports for enhanced breathability.</li>
    <li>Lightweight, non-marking soles.</li>
    <li>Water-friendly and buoyant; weighs only ounces.</li>
    <li>Fully molded Croslite&trade; material for lightweight cushioning and comfort.</li>
    <li>Heel strap swings back for snug fit, forward for wear as a clog.</li>]]></Custom6>
  </Offer>
....lots lots more <Offer> entries
</Offers>

I want to parse each instance of 'Offer' into its own row in a CSV file:我想在 CSV 文件中将“Offer”的每个实例解析为自己的行:

require 'csv'
require 'nokogiri'

file = File.read('input.xml')
doc = Nokogiri::XML(file)
a = []
csv = CSV.open('output.csv', 'wb') 

doc.css('Offer').each do |node|
    a.push << node.content.split
end

a.each { |a| csv << a } 

This runs nicely except I'm splitting on whitespace rather than each element of the Offer node so every word is going into its own column in the CSV file.这运行得很好,除了我在空格而不是 Offer 节点的每个元素上进行拆分,因此每个单词都进入 CSV 文件中的自己的列。

Is there a way to pick up the content of each node and how do I use the node names as headers in the CSV file?有没有办法获取每个节点的内容以及如何使用节点名称作为 CSV 文件中的标题?

This assumes that each Offer element always has the same child nodes (though they can be empty):这假设每个Offer元素始终具有相同的子节点(尽管它们可以为空):

CSV.open('output.csv', 'wb') do |csv|
  doc.search('Offer').each do |x|
    csv << x.search('*').map(&:text)
  end
end

And to get headers (from the first Offer element):并获取标题(来自第一个Offer元素):

CSV.open('output.csv', 'wb') do |csv|
  csv << doc.at('Offer').search('*').map(&:name)
  doc.search('Offer').each do |x|
    csv << x.search('*').map(&:text)
  end
end

search and at are Nokogiri functions that can take either XPath or CSS selector strings. searchat是可以采用 XPath 或 CSS 选择器字符串的 Nokogiri 函数。 at will return the first occurrence of an element; at将返回元素的第一次出现; search will provide an array of matching elements (or an empty array if no matches are found). search将提供匹配元素的数组(如果没有找到匹配,则提供一个空数组)。 The * in this case will select all nodes that are direct children of the current node.在这种情况下, *将选择作为当前节点的直接子节点的所有节点。

Both name and text are also Nokogiri functions (for an element). nametext也是 Nokogiri 函数(对于元素)。 name provides the element's name; name提供元素的名称; text provides the text or CDATA content of a node. text提供节点的文本或 CDATA 内容。

Try this, and modify it to push into your CSV:试试这个,并修改它以推送到您的 CSV:

doc.css('Offer').first.elements.each do |n|
  puts "#{n.name}: #{n.content}"
end

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM