简体   繁体   English

Ruby:提高效率

[英]Ruby: Increasing Efficiency

I am dealing with a large quantity of data and I'm worried about the efficiency of my operations at-scale. 我正在处理大量数据,并且担心大规模操作的效率。 After benchmarking, the average time to execute this string of code is about 0.004sec. 基准测试后,执行此代码串的平均时间约为0.004秒。 The goal of this line of code is to find the difference between the two values in each array location. 此行代码的目标是找出每个数组位置中两个值之间的差异。 In a previous operation, 111.111 was loaded into the arrays in locations which contained invalid data. 在先前的操作中,将111.111加载到包含无效数据的位置的阵列中。 Due to some weird time domain issues, I needed to do this because I couldn't just remove the values and I needed some distinguishable placeholder. 由于存在一些奇怪的时域问题,我需要执行此操作,因为我不能仅仅删除值,还需要一些可区分的占位符。 I could probably use 'nil' here instead. 我可能在这里使用“ nil”代替。 Anyways, back to the explanation. 无论如何,回到解释。 This line of code checks to ensure neither array has this 111.111 placeholder in the current location. 此代码行检查以确保两个数组在当前位置均不具有此111.111占位符。 If the values are valid then I perform the mathematical operation, otherwise I want to delete the values (or at least exclude them from the new array to which I'm writing). 如果值有效,则执行数学运算,否则,我想删除值(或至少将它们从我要写入的新数组中排除)。 I accomplished this by place a 'nil' in that location and then compacting the array afterwards. 我通过在该位置放置一个“零”来实现此目的,然后压缩数组。

The time of 0.004sec for 4000 data points in each array isn't terrible but this line of code is executed 25M times. 每个数组中4000个数据点的0.004秒时间并不可怕,但是这一行代码执行了25M次。 I'm hoping someone might be able to offer some insight into how I might optimize this line of code. 我希望有人能够对我如何优化这一行代码提供一些见解。

temp_row = row_1.zip(row_2).map do |x, y| 
  x == 111.111 || y == 111.111 ? nil : (x - y).abs 
end.compact

You are wasting CPU generating nil in the ternary statement, then using compact to remove them. 您正在浪费CPU在三元语句中生成nil ,然后使用compact删除它们。 Instead, use reject or select to find elements not containing 111.111 then map or something similar. 而是使用rejectselect查找不包含111.111元素,然后使用map或类似内容。

Instead of: 代替:

row_1 = [1, 111.111, 2]
row_2 = [2, 111.111, 4]

temp_row = row_1.zip(row_2).map do |x, y| 
  x == 111.111 || y == 111.111 ? nil : (x - y).abs 
end.compact
temp_row # => [1, 2]

I'd start with: 我将从以下内容开始:

temp_row = row_1.zip(row_2)
                .reject{ |x,y| x == 111.111 || y == 111.111 }
                .map{ |x,y| (x - y).abs }
temp_row # => [1, 2]

Or: 要么:

temp_row = row_1.zip(row_2)
                .each_with_object([]) { |(x,y), ary|
                  ary << (x - y).abs unless (x == 111.111 || y == 111.111)
                }
temp_row # => [1, 2]

Benchmarking different size arrays shows good things to know: 对不同大小的数组进行基准测试表明要了解的好事:

require 'benchmark'

DECIMAL_SHIFT = 100
DATA_ARRAY = (1 .. 1000).to_a
ROW_1 = (DATA_ARRAY + [111.111]).shuffle
ROW_2 = (DATA_ARRAY.map{ |i| i * 2 } + [111.111]).shuffle

Benchmark.bm(16) do |b|
  b.report('ternary:') do
    DECIMAL_SHIFT.times do
      ROW_1.zip(ROW_2).map do |x, y| 
        x == 111.111 || y == 111.111 ? nil : (x - y).abs 
      end.compact
    end
  end

  b.report('reject:') do
    DECIMAL_SHIFT.times do
      ROW_1.zip(ROW_2).reject{ |x,y| x == 111.111 || y == 111.111 }.map{ |x,y| (x - y).abs }
    end
  end

  b.report('each_with_index:') do
    DECIMAL_SHIFT.times do
      ROW_1.zip(ROW_2)
           .each_with_object([]) { |(x,y), ary|
             ary += [(x - y).abs] unless (x == 111.111 || y == 111.111)
           }
    end
  end
end

# >>                        user     system      total        real
# >> ternary:           0.240000   0.000000   0.240000 (  0.244476)
# >> reject:            0.060000   0.000000   0.060000 (  0.058842)
# >> each_with_index:   0.350000   0.000000   0.350000 (  0.349363)

Adjust the size of DECIMAL_SHIFT and DATA_ARRAY and the placement of 111.111 and see what happens to get an idea of what expressions work best for your data size and structure and fine-tune the code as necessary. 调整大小DECIMAL_SHIFTDATA_ARRAY的放置和111.111 ,看看会发生什么得到什么表情最适合您的数据的大小和结构的代码需要微调的想法。

您可以尝试parallel gem https://github.com/grosser/parallel并在多个线程上运行它

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM