简体   繁体   中英

Ruby: Increasing Efficiency

I am dealing with a large quantity of data and I'm worried about the efficiency of my operations at-scale. After benchmarking, the average time to execute this string of code is about 0.004sec. The goal of this line of code is to find the difference between the two values in each array location. In a previous operation, 111.111 was loaded into the arrays in locations which contained invalid data. Due to some weird time domain issues, I needed to do this because I couldn't just remove the values and I needed some distinguishable placeholder. I could probably use 'nil' here instead. Anyways, back to the explanation. This line of code checks to ensure neither array has this 111.111 placeholder in the current location. If the values are valid then I perform the mathematical operation, otherwise I want to delete the values (or at least exclude them from the new array to which I'm writing). I accomplished this by place a 'nil' in that location and then compacting the array afterwards.

The time of 0.004sec for 4000 data points in each array isn't terrible but this line of code is executed 25M times. I'm hoping someone might be able to offer some insight into how I might optimize this line of code.

temp_row = row_1.zip(row_2).map do |x, y| 
  x == 111.111 || y == 111.111 ? nil : (x - y).abs 
end.compact

You are wasting CPU generating nil in the ternary statement, then using compact to remove them. Instead, use reject or select to find elements not containing 111.111 then map or something similar.

Instead of:

row_1 = [1, 111.111, 2]
row_2 = [2, 111.111, 4]

temp_row = row_1.zip(row_2).map do |x, y| 
  x == 111.111 || y == 111.111 ? nil : (x - y).abs 
end.compact
temp_row # => [1, 2]

I'd start with:

temp_row = row_1.zip(row_2)
                .reject{ |x,y| x == 111.111 || y == 111.111 }
                .map{ |x,y| (x - y).abs }
temp_row # => [1, 2]

Or:

temp_row = row_1.zip(row_2)
                .each_with_object([]) { |(x,y), ary|
                  ary << (x - y).abs unless (x == 111.111 || y == 111.111)
                }
temp_row # => [1, 2]

Benchmarking different size arrays shows good things to know:

require 'benchmark'

DECIMAL_SHIFT = 100
DATA_ARRAY = (1 .. 1000).to_a
ROW_1 = (DATA_ARRAY + [111.111]).shuffle
ROW_2 = (DATA_ARRAY.map{ |i| i * 2 } + [111.111]).shuffle

Benchmark.bm(16) do |b|
  b.report('ternary:') do
    DECIMAL_SHIFT.times do
      ROW_1.zip(ROW_2).map do |x, y| 
        x == 111.111 || y == 111.111 ? nil : (x - y).abs 
      end.compact
    end
  end

  b.report('reject:') do
    DECIMAL_SHIFT.times do
      ROW_1.zip(ROW_2).reject{ |x,y| x == 111.111 || y == 111.111 }.map{ |x,y| (x - y).abs }
    end
  end

  b.report('each_with_index:') do
    DECIMAL_SHIFT.times do
      ROW_1.zip(ROW_2)
           .each_with_object([]) { |(x,y), ary|
             ary += [(x - y).abs] unless (x == 111.111 || y == 111.111)
           }
    end
  end
end

# >>                        user     system      total        real
# >> ternary:           0.240000   0.000000   0.240000 (  0.244476)
# >> reject:            0.060000   0.000000   0.060000 (  0.058842)
# >> each_with_index:   0.350000   0.000000   0.350000 (  0.349363)

Adjust the size of DECIMAL_SHIFT and DATA_ARRAY and the placement of 111.111 and see what happens to get an idea of what expressions work best for your data size and structure and fine-tune the code as necessary.

您可以尝试parallel gem https://github.com/grosser/parallel并在多个线程上运行它

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM