简体   繁体   中英

Parsing a Zip file and extracting records from text files

I am really new to Ruby and could use some help with a program. I need to open a zip file that contains multiple text files that has many rows of data (eg.)

CDI|3|3|20100515000000|20100515153000|2008|XXXXX4791|0.00|0.00
CDI|3|3|20100515000000|20100515153000|2008|XXXXX5648|0.00|0.00
CHO|3|3|20100515000000|20100515153000|2114|XXXXX3276|0.00|0.00
CHO|3|3|20100515000000|20100515153000|2114|XXXXX4342|0.00|0.00
MITR|3|3|20100515000000|20100515153000|0000|XXXXX7832|0.00|0.00
HR|3|3|20100515000000|20100515153000|1114|XXXXX0238|0.00|0.00

I first need to extract the zip file, read the text files located in the zip file and write only the complete rows that start with ( CDI and CHO ) to two output files, one for the rows of data starting with CDI and one for the rows of data starting with CHO (basically parsing the file). I have to do it with Ruby and possibly try to set the program to an auto function for arrival of continuous zip files of the same stature. I completely appreciate any advice, direction or help via some sample anyone can give.

One means is using the ZipFile library.

require 'zip/zip'

# To open the zip file and pass each entry to a block
Zip::ZipFile.foreach(path_to_zip) do |text_file|
   # Read from entry, turn String into Array, and pass to block
   text_file.read.split("\n").each do |line|
      if line.start_with?("CDI") || line.start_with?("CHO")
         # Do something
      end
   end
end

I'm not sure if I entirely follow your question. For starters, if you're looking to unzip files using Ruby, check out this question . Once you've got the file unzipped to a readable format, you can try something along these lines to print to the two separate outputs:

cdi_output = File.open("cdiout.txt", "a")  # Open an output file for CDI
cho_output = File.open("choout.txt", "a")  # Open an output file for CHO

File.open("text.txt", "r") do |f|          # Open the input file
  while line = f.gets                      # Read each line in the input
    cdi_output.puts line if /^CDI/ =~ line # Print if line starts with CDI
    cho_output.puts line if /^CHO/ =~ line # Print if line starts with CHO
  end
end

cdi_output.close                           # Close cdi_output file
cho_output.close                           # Close cho_output file

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM