简体   繁体   中英

Ruby strptime not working while reading file

I have the following code:

require 'date'

f = File.open(filepath)

f.each_with_index do |line, i|
    a, b = line.split("\t")
    d = DateTime.strptime(a, '%m/%d/%Y %I:%M %p')
    puts "#{a} --- #{b}"
    break unless i < 100
end

And I'm getting the following error:

c_reader.rb:10:in `strptime': invalid date (ArgumentError)
  from c_reader.rb:10:in `block in <main>'
  from c_reader.rb:6:in `each'
  from c_reader.rb:6:in `each_with_index'
  from c_reader.rb:6:in `<main>'

The file content:

1/30/2014 1:00 AM   1251.6  
1/30/2014 2:00 AM   1248  
1/30/2014 3:00 AM   1246.32  
1/30/2014 4:00 AM   1242.96  
1/30/2014 5:00 AM   1282.08  
1/30/2014 6:00 AM   1293.84  
1/30/2014 7:00 AM   1307.04  
1/30/2014 8:00 AM   1337.76  
1/30/2014 9:00 AM   1357.92

If I type this into IRB, it works perfect:

DateTime.strptime("1/30/2014 2:00 PM", '%m/%d/%Y %I:%M %p')

can someone please tell me what's going on here?

Your example data wasn't matching what your code was trying to process so I adjusted that for this. Plus, it needed something to show the AM/PM was being honored.

With those tweaks to the data, your code works fine. strptime is returning valid DateTime objects.

require 'date'

[
  "1/30/2014 1:00 AM\t1251.6",
  "1/30/2014 2:00 AM\t1248",
  "1/30/2014 3:00 PM\t1246.32",
  "1/30/2014 4:00 PM\t1242.96",
].each do |line|
  a, b = line.split("\t")
  puts DateTime.strptime(a, '%m/%d/%Y %I:%M %p')
end
# >> 2014-01-30T01:00:00+00:00
# >> 2014-01-30T02:00:00+00:00
# >> 2014-01-30T15:00:00+00:00
# >> 2014-01-30T16:00:00+00:00

Your data file has a BOM ("byte-order-mark"). The first two bytes indicate the "endianness" of the order of bytes in the file. In addition, each character actually occupies two bytes. This is a UTF-16LE file because fffe has a missing bit ( 0xfe == 0b11111110 ) signifying the end of the byte-pair is smaller than the first byte. If it was feff it'd be a "big-endian":

0000000: fffe 3100 2f00 3300 3000 2f00 3200 3000  ..1./.3.0./.2.0.

Ruby doesn't know what to do with those because it's expecting its default of UTF-8. To fix that you need to tell Ruby how to interpret that. Look at the documentation for IO.new to see how to define encodings. Ruby assumes data will be UTF-8, so the incoming data has to be converted from UTF-16LE to UTF-8. This is one way to do it:

require 'date'

File.open(
  "test.csv",
  "rb:BOM|UTF-16LE:UTF-8"
) do |fi|
  fi.each_with_index do |line, i|
    a, b = line.split("\t")
    d = DateTime.strptime(a, '%m/%d/%Y %I:%M %p')
    puts "#{ 1 + i } #{a} --- #{b}"
    break unless i < 100
  end
end

Running that outputs:

1 1/30/2014 1:00 AM --- 1251.6
2 1/30/2014 2:00 AM --- 1248
3 1/30/2014 3:00 AM --- 1246.32
4 1/30/2014 4:00 AM --- 1242.96
5 1/30/2014 5:00 AM --- 1282.08
6 1/30/2014 6:00 AM --- 1293.84
7 1/30/2014 7:00 AM --- 1307.04
8 1/30/2014 8:00 AM --- 1337.76
9 1/30/2014 9:00 AM --- 1357.92

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM