简体   繁体   English

在Ruby中加快Date#parse和Date#strptime的速度,采用更优雅的方式还是最佳实践?

[英]Speed up Date#parse & Date#strptime in Ruby, more elegant way or best practice?

This question is derived from another performance issue of processing a large text with date formatted string . 这个问题来自另一个处理日期格式为string的大文本的性能问题。

After loading data from csv file in to a ruby array, the most inefficient part is parse those 360,000 date formatted string objects into date objects. 将csv文件中的数据加载到ruby数组中后,效率最低的部分是将那些360,000日期格式的字符串对象解析为日期对象。 It takes more than 50% cpu time. 这需要超过50%的CPU时间。

There are some question about the most efficient way of parsing string into date in SO. 关于在SO中将字符串解析为日期的最有效方法存在一些问题。 But most of them are out of date, and none of them considered this situation that there are only 5 date objects really should be parsed among all those 360,000 records. 但是它们大多数都已过时,并且没有一个人认为在所有360,000条记录中实际上应该只解析5个日期对象的情况。

More commonly, for an enterprise application, all the dates needed may be within 5 or 10 years, that's about 2,000 to 4,000 dates. 更常见的是,对于企业应用程序而言,所需的所有日期可能都在5年或10年内,即大约2,000至4,000个日期。 If there are only 100 data records for one day I need to fetch from file or DB, 99% of the CPU time used on parsing dates and create date objects are not necessary. 如果一天只需要从文件或数据库中获取100条数据记录,则无需使用99%的CPU时间来解析日期和创建日期对象。

Here's my attempt 这是我的尝试

Define an StaticDate class to improve the performance by storing the date objects parsed before. 定义一个StaticDate类,以通过存储之前解析的日期对象来提高性能。

require 'date'
class StaticDate
  @@all={}
  def self.instance(p1 = nil, p2 = nil, p3 = nil, p4 = Date::JULIAN)
    @@all[p1*10000+p2*100+p3] ||= Date.new p1, p2, p3, p4
  end

  def self.parse( date_str)
    @@all[date_str] ||= Date.parse date_str
  end

  def self.strptime( date_str, format_str)
    @@all[date_str + format_str] ||= Date.strptime date_str, format_str
  end
end

My questions 我的问题

I known my code had bad smell of duplicating a same functional class, but in this scenario of 360,000 records, it gets 13x speed up for Date#strptime and 41x speed up for Date#parse . 我知道我的代码有重复相同功能类的难闻的气味,但是在这种360,000条记录的情况下, Date#strptime速度提高了13倍, Date#parse速度提高了41倍。 So I think it's really worth to improve and refactory: 因此,我认为值得改进和重构:

  • Is there any gem or plugin already implement these stuff with more elegant way? 有没有gem或plugin已经以更优雅的方式实现了这些东西? Or any suggestion to improve or refactory these code is appreciated. 或任何改进或重构这些代码的建议都值得赞赏。
  • Since we all know that all ruby date objects are immutable. 既然我们都知道所有红宝石日期对象都是不可变的。 Do you think it's neccessary to extend these features to ruby date class? 您是否认为有必要将这些功能扩展到红宝石日期类?
  • Is there any other best practice of getting best performance of date object operations in an rails application? 在Rails应用程序中是否还有其他最佳实践来获得最佳的日期对象操作性能? (Omit this question if you think it's to broad.) (如果您认为范围太广,请忽略此问题。)

Sure I'm doing something wrong and I'm non-English, so any help to improve this class or this question will he greatly appreciated. 当然,我做错了事,而且我不是英语,所以对提高班级或这个问题的任何帮助将不胜感激。

Thanks in advance 提前致谢

Benchmark of my attempt 我尝试的基准

Instead of loading data from file, I create an input array of 360,000 rows like this: 我创建了一个包含360,000行的输入数组,而不是从文件加载数据:

a= [['a', '2014-6-1', '1'],
    ['a', '2014-6-2', '2'],
    ['a', '2014-6-4', '3'],
    ['a', '2014-6-5', '4'],
    ['b', '2014-6-1', '1'],
    ['b', '2014-6-2', '2'],
    ['b', '2014-6-3', '3'],
    ['b', '2014-6-4', '4'],
    ['b', '2014-6-5', '5']]*40000

Benchmark code: 基准代码:

b=a.map{|x| x + x[1].split('-').map(& :to_i) }
Benchmark.bm {|x|
  x.report('0. Date#New 1 date '){ 360000.times{ Date.new(2014,1,1)} }
  x.report('1. Date#New        '){ b.each{ |x| Date.new(x[3],x[4],x[5])} }
  x.report('2. Date#Strptime   '){ a.each{ |x| Date.strptime(x[1],"%Y-%m-%d")} }
  x.report('3. Date#Parse      '){ a.each{ |x| Date.parse(x[1])} }
  x.report('4. StaticDate#New  '){ b.each{ |x| StaticDate.instance( x[3],x[4],x[5]) } }
  x.report('5. StaticDate#StrP '){ a.each{ |x| StaticDate.strptime(x[1],"%Y-%m-%d")} }
  x.report('6. StaticDate#Parse'){ a.each{ |x| StaticDate.parse(x[1])} }
  x.report('7. split to date   '){ a.each{ |x| Date.new(*(x[1].split('-').map(& :to_i)))} }

}  

Benchmark result: 基准测试结果:

                         user     system      total        real
0. Date#New 1 date   0.297000   0.000000   0.297000 (  0.299017)
1. Date#New          0.390000   0.000000   0.390000 (  0.384022)
2. Date#Strptime     2.293000   0.000000   2.293000 (  2.306132)
3. Date#Parse        7.113000   0.000000   7.113000 (  7.101406)
4. StaticDate#New    0.188000   0.000000   0.188000 (  0.188011)
5. StaticDate#StrP   0.546000   0.000000   0.546000 (  0.558032)
6. StaticDate#Parse  0.171000   0.000000   0.171000 (  0.167010)
7. split to date     1.623000   0.000000   1.623000 (  1.641094)

According to the Date documentation : 根据Date文档

All date objects are immutable; 所有日期对象都是不可变的。 hence cannot modify themselves. 因此无法修改自己。

If creating date instances from a string is your bottleneck, you could use a hash to create and store them: 如果从字符串创建日期实例是您的瓶颈,则可以使用哈希来创建和存储它们:

date_store = Hash.new { |h, k| h[k] = Date.strptime(k, '%Y-%m-%d') }

date_store['2014-6-1'] #=> #<Date: 2014-06-01 ((2456810j,0s,0n),+0s,2299161j)>
date_store['2014-6-2'] #=> #<Date: 2014-06-02 ((2456811j,0s,0n),+0s,2299161j)>
date_store['2014-6-3'] #=> #<Date: 2014-06-03 ((2456812j,0s,0n),+0s,2299161j)>

All results are saved in the hash: 所有结果都保存在哈希中:

date_store
#=> {"2014-6-1"=>#<Date: 2014-06-01 ((2456810j,0s,0n),+0s,2299161j)>,
#    "2014-6-2"=>#<Date: 2014-06-02 ((2456811j,0s,0n),+0s,2299161j)>,
#    "2014-6-3"=>#<Date: 2014-06-03 ((2456812j,0s,0n),+0s,2299161j)>}

Fetching a known key is merely a lookup, no parsing is performed and no new Date instances have to be created. 提取已知密钥只是一个查找,不执行任何解析,也不必创建新的Date实例。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM