简体   繁体   中英

Parsing of big file with Ruby

I need to parse extremely big XML file (near 50GB), how I can do it with Ruby? It's not possible to split it on chunks, I'v already tried.

I parsed a 40GB file using Nokogiri::XML::Reader .

Structure of my XML file:

<?xml version="1.0" encoding="utf-8"?>
<posts>
   <row Id="4">
   <row Id="5">
</posts>

Code:

require 'nokogiri'

fname = "Posts.xml"
xml = Nokogiri::XML::Reader(File.open(fname))
xml.each do |posts|
  posts.each do |post|
    next if post.node_type == 14 # TYPE_SIGNIFICANT_WHITESPACE
    # do something with post
  end
end 

I think the answer depends on how you plan to use the data. In my case, I simply needed to stream the post nodes.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM