This question is derived from another performance issue of processing a large text with date formatted string .
After loading data from csv file in to a ruby array, the most inefficient part is parse those 360,000 date formatted string objects into date objects. It takes more than 50% cpu time.
There are some question about the most efficient way of parsing string into date in SO. But most of them are out of date, and none of them considered this situation that there are only 5 date objects really should be parsed among all those 360,000 records.
More commonly, for an enterprise application, all the dates needed may be within 5 or 10 years, that's about 2,000 to 4,000 dates. If there are only 100 data records for one day I need to fetch from file or DB, 99% of the CPU time used on parsing dates and create date objects are not necessary.
Define an StaticDate
class to improve the performance by storing the date objects parsed before.
require 'date'
class StaticDate
@@all={}
def self.instance(p1 = nil, p2 = nil, p3 = nil, p4 = Date::JULIAN)
@@all[p1*10000+p2*100+p3] ||= Date.new p1, p2, p3, p4
end
def self.parse( date_str)
@@all[date_str] ||= Date.parse date_str
end
def self.strptime( date_str, format_str)
@@all[date_str + format_str] ||= Date.strptime date_str, format_str
end
end
I known my code had bad smell of duplicating a same functional class, but in this scenario of 360,000 records, it gets 13x speed up for Date#strptime
and 41x speed up for Date#parse
. So I think it's really worth to improve and refactory:
Sure I'm doing something wrong and I'm non-English, so any help to improve this class or this question will he greatly appreciated.
Thanks in advance
Instead of loading data from file, I create an input array of 360,000 rows like this:
a= [['a', '2014-6-1', '1'],
['a', '2014-6-2', '2'],
['a', '2014-6-4', '3'],
['a', '2014-6-5', '4'],
['b', '2014-6-1', '1'],
['b', '2014-6-2', '2'],
['b', '2014-6-3', '3'],
['b', '2014-6-4', '4'],
['b', '2014-6-5', '5']]*40000
Benchmark code:
b=a.map{|x| x + x[1].split('-').map(& :to_i) }
Benchmark.bm {|x|
x.report('0. Date#New 1 date '){ 360000.times{ Date.new(2014,1,1)} }
x.report('1. Date#New '){ b.each{ |x| Date.new(x[3],x[4],x[5])} }
x.report('2. Date#Strptime '){ a.each{ |x| Date.strptime(x[1],"%Y-%m-%d")} }
x.report('3. Date#Parse '){ a.each{ |x| Date.parse(x[1])} }
x.report('4. StaticDate#New '){ b.each{ |x| StaticDate.instance( x[3],x[4],x[5]) } }
x.report('5. StaticDate#StrP '){ a.each{ |x| StaticDate.strptime(x[1],"%Y-%m-%d")} }
x.report('6. StaticDate#Parse'){ a.each{ |x| StaticDate.parse(x[1])} }
x.report('7. split to date '){ a.each{ |x| Date.new(*(x[1].split('-').map(& :to_i)))} }
}
Benchmark result:
user system total real
0. Date#New 1 date 0.297000 0.000000 0.297000 ( 0.299017)
1. Date#New 0.390000 0.000000 0.390000 ( 0.384022)
2. Date#Strptime 2.293000 0.000000 2.293000 ( 2.306132)
3. Date#Parse 7.113000 0.000000 7.113000 ( 7.101406)
4. StaticDate#New 0.188000 0.000000 0.188000 ( 0.188011)
5. StaticDate#StrP 0.546000 0.000000 0.546000 ( 0.558032)
6. StaticDate#Parse 0.171000 0.000000 0.171000 ( 0.167010)
7. split to date 1.623000 0.000000 1.623000 ( 1.641094)
According to the Date
documentation :
All date objects are immutable; hence cannot modify themselves.
If creating date instances from a string is your bottleneck, you could use a hash to create and store them:
date_store = Hash.new { |h, k| h[k] = Date.strptime(k, '%Y-%m-%d') }
date_store['2014-6-1'] #=> #<Date: 2014-06-01 ((2456810j,0s,0n),+0s,2299161j)>
date_store['2014-6-2'] #=> #<Date: 2014-06-02 ((2456811j,0s,0n),+0s,2299161j)>
date_store['2014-6-3'] #=> #<Date: 2014-06-03 ((2456812j,0s,0n),+0s,2299161j)>
All results are saved in the hash:
date_store
#=> {"2014-6-1"=>#<Date: 2014-06-01 ((2456810j,0s,0n),+0s,2299161j)>,
# "2014-6-2"=>#<Date: 2014-06-02 ((2456811j,0s,0n),+0s,2299161j)>,
# "2014-6-3"=>#<Date: 2014-06-03 ((2456812j,0s,0n),+0s,2299161j)>}
Fetching a known key is merely a lookup, no parsing is performed and no new Date
instances have to be created.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.