简体   繁体   中英

Speed up Date#parse & Date#strptime in Ruby, more elegant way or best practice?

This question is derived from another performance issue of processing a large text with date formatted string .

After loading data from csv file in to a ruby array, the most inefficient part is parse those 360,000 date formatted string objects into date objects. It takes more than 50% cpu time.

There are some question about the most efficient way of parsing string into date in SO. But most of them are out of date, and none of them considered this situation that there are only 5 date objects really should be parsed among all those 360,000 records.

More commonly, for an enterprise application, all the dates needed may be within 5 or 10 years, that's about 2,000 to 4,000 dates. If there are only 100 data records for one day I need to fetch from file or DB, 99% of the CPU time used on parsing dates and create date objects are not necessary.

Here's my attempt

Define an StaticDate class to improve the performance by storing the date objects parsed before.

require 'date'
class StaticDate
  @@all={}
  def self.instance(p1 = nil, p2 = nil, p3 = nil, p4 = Date::JULIAN)
    @@all[p1*10000+p2*100+p3] ||= Date.new p1, p2, p3, p4
  end

  def self.parse( date_str)
    @@all[date_str] ||= Date.parse date_str
  end

  def self.strptime( date_str, format_str)
    @@all[date_str + format_str] ||= Date.strptime date_str, format_str
  end
end

My questions

I known my code had bad smell of duplicating a same functional class, but in this scenario of 360,000 records, it gets 13x speed up for Date#strptime and 41x speed up for Date#parse . So I think it's really worth to improve and refactory:

  • Is there any gem or plugin already implement these stuff with more elegant way? Or any suggestion to improve or refactory these code is appreciated.
  • Since we all know that all ruby date objects are immutable. Do you think it's neccessary to extend these features to ruby date class?
  • Is there any other best practice of getting best performance of date object operations in an rails application? (Omit this question if you think it's to broad.)

Sure I'm doing something wrong and I'm non-English, so any help to improve this class or this question will he greatly appreciated.

Thanks in advance

Benchmark of my attempt

Instead of loading data from file, I create an input array of 360,000 rows like this:

a= [['a', '2014-6-1', '1'],
    ['a', '2014-6-2', '2'],
    ['a', '2014-6-4', '3'],
    ['a', '2014-6-5', '4'],
    ['b', '2014-6-1', '1'],
    ['b', '2014-6-2', '2'],
    ['b', '2014-6-3', '3'],
    ['b', '2014-6-4', '4'],
    ['b', '2014-6-5', '5']]*40000

Benchmark code:

b=a.map{|x| x + x[1].split('-').map(& :to_i) }
Benchmark.bm {|x|
  x.report('0. Date#New 1 date '){ 360000.times{ Date.new(2014,1,1)} }
  x.report('1. Date#New        '){ b.each{ |x| Date.new(x[3],x[4],x[5])} }
  x.report('2. Date#Strptime   '){ a.each{ |x| Date.strptime(x[1],"%Y-%m-%d")} }
  x.report('3. Date#Parse      '){ a.each{ |x| Date.parse(x[1])} }
  x.report('4. StaticDate#New  '){ b.each{ |x| StaticDate.instance( x[3],x[4],x[5]) } }
  x.report('5. StaticDate#StrP '){ a.each{ |x| StaticDate.strptime(x[1],"%Y-%m-%d")} }
  x.report('6. StaticDate#Parse'){ a.each{ |x| StaticDate.parse(x[1])} }
  x.report('7. split to date   '){ a.each{ |x| Date.new(*(x[1].split('-').map(& :to_i)))} }

}  

Benchmark result:

                         user     system      total        real
0. Date#New 1 date   0.297000   0.000000   0.297000 (  0.299017)
1. Date#New          0.390000   0.000000   0.390000 (  0.384022)
2. Date#Strptime     2.293000   0.000000   2.293000 (  2.306132)
3. Date#Parse        7.113000   0.000000   7.113000 (  7.101406)
4. StaticDate#New    0.188000   0.000000   0.188000 (  0.188011)
5. StaticDate#StrP   0.546000   0.000000   0.546000 (  0.558032)
6. StaticDate#Parse  0.171000   0.000000   0.171000 (  0.167010)
7. split to date     1.623000   0.000000   1.623000 (  1.641094)

According to the Date documentation :

All date objects are immutable; hence cannot modify themselves.

If creating date instances from a string is your bottleneck, you could use a hash to create and store them:

date_store = Hash.new { |h, k| h[k] = Date.strptime(k, '%Y-%m-%d') }

date_store['2014-6-1'] #=> #<Date: 2014-06-01 ((2456810j,0s,0n),+0s,2299161j)>
date_store['2014-6-2'] #=> #<Date: 2014-06-02 ((2456811j,0s,0n),+0s,2299161j)>
date_store['2014-6-3'] #=> #<Date: 2014-06-03 ((2456812j,0s,0n),+0s,2299161j)>

All results are saved in the hash:

date_store
#=> {"2014-6-1"=>#<Date: 2014-06-01 ((2456810j,0s,0n),+0s,2299161j)>,
#    "2014-6-2"=>#<Date: 2014-06-02 ((2456811j,0s,0n),+0s,2299161j)>,
#    "2014-6-3"=>#<Date: 2014-06-03 ((2456812j,0s,0n),+0s,2299161j)>}

Fetching a known key is merely a lookup, no parsing is performed and no new Date instances have to be created.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM