简体   繁体   中英

Parsing data from CSV with Ruby?

I have this CSV file:

1,a,"first letter","[1,2,3,4]"
2,b,"second letter","[2,5,6,8]"
...

In this file, the first column is an integer , the second column is an alphabetic character , the third column is a string and fourth column is an array of integers.

I often read CSV files using:

array = []
CSV.foreach("path/to/file.csv") do |row|
  array << row
end

But I need parsed data.

Is there a way to parse correctly datatypes from loading?

CSV can only return text, because the source file is only text, and there is nothing inside the record/line that specifies what each column type is.

If you are in control of the data file creation, you can use YAML or JSON to serialize the data, and it will be returned as strings and numerics, and, if you're willing to forgo the ability to use the file with other languages, you can actually return Ruby objects. (I'd recommend sticking with more generic serializing though.)

If you're stuck with CSV, then you'll need to provide code to convert the fields to the types you want, which isn't hard. Something like this untested code should get you on your way:

array = []
CSV.foreach("path/to/file.csv") do |row|
  int, alpha, str, ary_of_int = row
  array << [int.to_i, alpha, str, ary_of_int.scan(/\d+/).map(&:to_i)]
end

JSON makes it easy to move data around and recover it from its serialized state:

require 'json'

ary = [
  [1, 'a', "first letter", [1,2,3,4]],
  [2, 'b', "second letter", [2,5,6,8]]
]
json_ary = JSON[ary]
puts json_ary
# >> [[1,"a","first letter",[1,2,3,4]],[2,"b","second letter",[2,5,6,8]]]

require 'pp'
pp JSON[json_ary]
# >> [[1, "a", "first letter", [1, 2, 3, 4]],
# >>  [2, "b", "second letter", [2, 5, 6, 8]]]

JSON.[] looks to see whether the parameter received is a string, or an array or hash. If it's a string it attempts to parse the data. If it's an array or hash it attempts to convert it to a JSON string.

YAML works similarly:

require 'yaml'

ary = [
  [1, 'a', "first letter", [1,2,3,4]],
  [2, 'b', "second letter", [2,5,6,8]]
]
yaml_ary = ary.to_yaml
puts yaml_ary
# >> ---
# >> - - 1
# >>   - a
# >>   - first letter
# >>   - - 1
# >>     - 2
# >>     - 3
# >>     - 4
# >> - - 2
# >>   - b
# >>   - second letter
# >>   - - 2
# >>     - 5
# >>     - 6
# >>     - 8

require 'pp'
pp YAML.load(yaml_ary)
# >> [[1, "a", "first letter", [1, 2, 3, 4]],
# >>  [2, "b", "second letter", [2, 5, 6, 8]]]

You could use XML, but it still only knows its content is a text node. You have to write code to interpret the XML and convert the data values to the appropriate types.

There is no built in support for this in standart CSV package, although array like "[1,2,3,4]" is just a string for ruby, actually anything is a string even numbers. You need to make this parsing by your own

Not built in, however:

1,a,"first letter","[1,2,3,4]"

to get the integer I would just call .to_i . for the array you could:

require "json"
JSON.parse("[1,2,3,4]")
=> [1, 2, 3, 4]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM