简体   繁体   English

如何提高这个小Ruby function的性能?

[英]How can I improve the performance of this small Ruby function?

I am currently doing a Ruby challenge and get the error Terminated due to timeout for some testcases where the string input is very long (10.000+ characters).我目前正在执行 Ruby 质询,并且对于某些字符串输入很长(10.000+ 个字符)的测试用例, Terminated due to timeout出现错误。

How can I improve my code?如何改进我的代码?

Ruby challenge description Ruby 挑战说明

You are given a string containing characters A and B only.给您一个仅包含字符 A 和 B 的字符串。 Your task is to change it into a string such that there are no matching adjacent characters.您的任务是将其更改为没有匹配的相邻字符的字符串。 To do this, you are allowed to delete zero or more characters in the string.为此,您可以删除字符串中的零个或多个字符。

Your task is to find the minimum number of required deletions.您的任务是找到所需删除的最少数量。

For example, given the string s = AABAAB , remove A an at positions 0 and 3 to make s = ABAB in 2 deletions.例如,给定字符串s = AABAAB ,删除位置03A以使s = ABAB删除2

My function我的 function

def alternatingCharacters(s)
    counter = 0
    s.chars.each_with_index { |char, idx| counter += 1 if s.chars[idx + 1] == char }
    return counter
end

Thank you!谢谢!

This could be faster returning the count:这可能会更快地返回计数:

str.size - str.chars.chunk_while{ |a, b| a == b }.to_a.size

The second part uses String#chars method in conjunction with Enumerable#chunk_while .第二部分将String#chars方法与Enumerable#chunk_while结合使用。 This way the second part groups in subarrays:这样第二部分在子数组中分组:

'aababbabbaab'.chars.chunk_while{ |a, b| a == b}.to_a
#=> [["a", "a"], ["b"], ["a"], ["b", "b"], ["a"], ["b", "b"], ["a", "a"], ["b"]]

Trivial if you can use squeeze :如果可以使用,则squeeze

str.length - str.squeeze.length

Otherwise, you could try a regular expression that matches those A (or B ) that are preceded by another A (or B ):否则,您可以尝试匹配前面有另一个A (或B )的那些A (或B )的正则表达式:

str.enum_for(:scan, /(?<=A)A|(?<=B)B/).count

Using enum_for avoids the creation of the intermediate array.使用enum_for可以避免创建中间数组。

The main issue with:主要问题:

 s.chars.each_with_index { |char, idx| counter += 1 if s.chars[idx + 1] == char }

Is the fact that you don't save chars into a variable.是您不将chars保存到变量中的事实。 s.chars will rip apart the string into an array of characters. s.chars会将字符串撕成一个字符数组。 The first s.chars call outside the loop is fine.循环外的第一个s.chars调用很好。 However there is no reason to do this for each character in s .但是,没有理由对s中的每个字符都这样做。 This means if you have a string of 10.000 characters, you'll instantiate 10.001 arrays of size 10.000.这意味着如果您有一个 10.000 个字符的字符串,您将实例化大小为 10.000 的 10.001 arrays。

Re-using the characters array will give you a huge performance boost:重新使用字符数组会给你带来巨大的性能提升:

require 'benchmark'

s  = ''
options = %w[A B]
10_000.times { s << options.sample }
Benchmark.bm do |x|
  x.report do
    counter = 0
    s.chars.each_with_index { |char, idx| counter += 1 if s.chars[idx + 1] == char }
    #           create a character array for each iteration ^
  end

  x.report do
    counter = 0
    chars = s.chars # <- only create a character array once
    chars.each_with_index { |char, idx| counter += 1 if chars[idx + 1] == char }
  end
end
       user     system      total        real
   8.279767   0.000001   8.279768 (  8.279655)
   0.002188   0.000003   0.002191 (  0.002191)

You could also make use of enumerator methods like each_cons and count to simplify the code, this doesn't increase performance cost a lot, but makes the code a lot more readable.您还可以使用each_conscount之类的枚举器方法来简化代码,这不会大大增加性能成本,但会使代码更具可读性。

Benchmark.bm do |x|
  x.report do
    counter = 0
    chars = s.chars
    chars.each_with_index { |char, idx| counter += 1 if chars[idx + 1] == char }
  end

  x.report do
    s.each_char.each_cons(2).count { |a, b| a == b }
    #  ^ using each_char instead of chars to avoid
    #    instantiating a character array
  end
end
       user     system      total        real
   0.002923   0.000000   0.002923 (  0.002920)
   0.003995   0.000000   0.003995 (  0.003994)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM