简体   繁体   English

性能:迭代除元素之外的数组

[英]Performance: iterate over an array except for an element

I am using Ruby on Rails 3.0.7 and I would like to iterate over an array of objects (class istances) except for the element with id equal to 1 (the id refers to the array[ 1 ] index). 我正在使用Ruby on Rails 3.0.7并且我想迭代一个对象数组(类istances),除了id等于1的元素( id指向数组[ 1 ]索引)。

I know that I can use an if statement "internally" to an each statement and doing that I check for each "current"\\"considered" element if id == 1 . 我知道我可以在each语句的“内部”使用if语句,并且if id == 1 ,我会检查每个“当前”\\“考虑”元素。 However, since the array is populated of a lot of data, I would like to find another way in order to accomplish the same things in a more performant way (avoiding to run the if each time). 但是,由于数组填充了大量数据,我想找到另一种方法,以更高效的方式完成相同的事情(避免每次运行if )。

How can I do? 我能怎么做?

  1. Make program work 使程序工作
  2. Profile 轮廓
  3. Optimize 优化

Donald Knuth said: 唐纳德克努特说:

We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. 我们应该忘记小的效率,大约97%的时间说:过早的优化是所有邪恶的根源。

Now, you could do something like: 现在,您可以执行以下操作:

def f
  do_something
end

f 0
for i in 2..n
  f i
end

Or even: 甚至:

def f
  yield 0
  for i in 2..@n
    yield i
  end
end

f do |i|
  do_something
end

But you probably do not want to do anything of the sort, and if you did, it would only be after finding out that it matters. 但是你可能不希望做任何类似的事情,如果你这样做,那只会在发现它很重要之后。

And finally, suppose this ugly trick actually makes your server run ever so slightly faster. 最后,假设这个丑陋的技巧实际上让你的服务器运行得更快。 Was it worth it? 它值得吗?

if statement is a very cheap operation. if声明是一个非常便宜的操作。 You can check that using standard benchmarking tools. 您可以使用标准基准测试工具进行检查。

require "benchmark"

array = [1] * 100_000

Benchmark.bm do |bm|
  bm.report "with if" do
    array.each_with_index do |element, i|
      next if i == 1
      element - 1
    end
  end

  bm.report "without if" do
    array.each do |element|
      element - 1
    end
  end
end

Results: 结果:

                user     system      total        real
with if     0.020000   0.000000   0.020000 (  0.018115)
without if  0.010000   0.000000   0.010000 (  0.012248)

It is about 0.006 second difference on a 100 000 elements array. 在100 000个元件阵列上差异大约为0.006秒。 You shouldn't care about that unless it becomes a bottleneck and I doubt it will. 你不应该关心它,除非它成为一个瓶颈,我怀疑它会。

a = ['a', 'b', 'c']
a.each_with_index.reject {|el,i| i == 1}.each do |el,i|
  # do whatever with the element
  puts el
end

Is IMHO a better way of doing the selection instead of using your own explicit if statement. 恕我直言是一种更好的选择方式,而不是使用你自己的显式if语句。 I believe, however, that it will result in approximately the same performance as using if , maybe even slightly lower. 但是,我相信它会产生与使用if ,甚至可能略低的性能大致相同的性能。

If after benchmarking as others have suggested you know that the time this takes is definitely slower than what you can allow it, and it is the selection causing the slowness then this can be easily modified to remove the selection in a number of ways: 如果在基准测试之后,其他人已经建议您知道所花费的时间肯定比您允许的时间慢,并且选择导致缓慢,那么可以通过多种方式轻松修改以删除选择:

a = ['a', 'b', 'c']
n = 1
(a.first(n) + a.drop(n + 1)).each do |el|
  # do whatever with the element
  puts el
end

Unfortunately I believe this will also be slower than running the simple if. 不幸的是,我相信这也会比运行简单的if慢。 One that I believe might have the potential for speed is: 我相信可能有速度的一个是:

a = ['a', 'b', 'c']
n = 1
((0...n).to_a+((n+1)...a.size).to_a).map{|i| a[i]}.each do |el|
  # do whatever with the element
  puts el
end

But this again has a high possibility of being slower. 但这又有可能变慢。

EDIT 编辑

Benchmark is in this gist . 基准是这个要点 These results actually surprised me, reject is by far the slowest option, followed by the ranges. 这些结果实际上让我感到惊讶,拒绝是迄今为止最慢的选择,其次是范围。 The highest performing after not removing the element at all was using first and drop to select all the elements around it. 在完全不删除元素之后表现最高的是first使用并drop以选择它周围的所有元素。

The results as a percentage using no selection as a baseline: 结果为百分比,不使用选择作为基线:

with if             146%
with first and drop 104%
without if          100%

Obviously this is highly dependant on what you do with the elements, this was testing with probably the fastest operation Ruby can perform. 显然,这高度依赖于你对元素的处理,这可能是Ruby可以执行的最快的操作。 The slower the operation the less difference these will have. 操作越慢,这些差异越小。 As always: Benchmark, Benchmark, Benchmark 一如既往: Benchmark,Benchmark,Benchmark

Testing an actual for loop might be worth the five minutes it would take. 测试一个实际的for循环可能值五分钟。 It may be frowned on in Ruby circles, but that doesn't mean it is not ever worth using. 它可能在Ruby圈子中不受欢迎,但这并不意味着它永远不值得使用。 When you call each or map or whatever, those methods use for loops any way. 当你调用每个或map或者其他什么时,这些方法都会以任何方式使用for循环。 Avoid absolutes. 避免绝对。

It also depends how big the array can get, at some n, one might become faster than the other. 它还取决于阵列可以达到多大,在某些n,一个可能变得比另一个快。 In this case, it is definitely not worth it. 在这种情况下,它绝对不值得。

If you don't need a specific element, maybe you don't need to store that row of data in a database. 如果您不需要特定元素,则可能不需要将该行数据存储在数据库中。 What is the difference between row 1 and the rest of the rows, in other words, why are you skipping it? 第1行和其他行之间有什么区别,换句话说,为什么要跳过它? Is the row with id=1 always going to have the the same data in it? id = 1的行总是会有相同的数据吗? If so, storing it as a constant would likely be better, and would make your question moot. 如果是这样,将其存储为常数可能会更好,并会使您的问题没有实际意义。 Performance almost always costs more memory. 性能几乎总是花费更多的内存。

Unless Rails 3 does things differently, and you pull out the data and use id as the finder key, id=1 will be in element 0. 除非Rails 3以不同的方式执行操作,并且您提取数据并使用id作为查找程序键,否则id = 1将位于元素0中。

Unfortunately, Knuth's quote gets misinterpreted a lot and get used to excuse crappy, inefficient code that would not be written if the programmer was sufficiently educated, and thought about it for 5 seconds. 不幸的是,Knuth的引用经常被误解,并习惯于为程序员受过充分教育而不能写入的5秒钟内无法编写的糟糕,低效的代码提供借口。 Sure, spending a week trying to speed up code that you don't know is a problem or is a minor problem but this is more what Knuth was talking about. 当然,花一个星期试图加速你不知道的代码是一个问题或是一个小问题,但这更像是Knuth所说的。 Performance is one of the most misunderstood and abused concepts in computer science. 性能是计算机科学中最容易被误解和滥用的概念之一。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM