有没有比 count() 更快的方法来计算字符串中的非重叠出现？

Question

Given a minimum length N and a string S of 1's and 0's (eg "01000100"), I am trying to return the number of non-overlapping occurrences of a sub-string of length n containing all '0's.给定一个最小长度 N 和一个 1 和 0 的字符串 S（例如“01000100”），我试图返回一个包含所有 '0' 的长度为 n 的子字符串的非重叠出现次数。 For example, given n=2 and the string "01000100", the number of non-overlapping "00"s is 2.例如，给定 n=2 和字符串“01000100”，不重叠的“00”的数量为 2。

This is what I have done:这就是我所做的：

def myfunc(S,N):
  return S.count('0'*N)

My question: is there a faster way of performing this for very long strings?我的问题：有没有更快的方法来处理很长的字符串？ This is from an online coding practice site and my code passes all but one of the test cases, which fails due to not being able to finish within a time limit.这是来自在线编码练习网站，我的代码通过了除一个测试用例之外的所有测试用例，由于无法在时间限制内完成而失败。 Doing some research it seems I can only find that count() is the fastest method for this.做一些研究似乎我只能发现 count() 是最快的方法。

Answer 1

This might be faster:这可能会更快：

>>> s = "01000100"
>>> def my_count( a, n ) :
...     parts = a.split('1')
...     return sum( len(p)//n for p in parts )
... 
>>> my_count(s, 2)
2
>>>

Worst case scenario for count() is O(N^2), the function above is strictly linear O(N). count()的最坏情况是 O(N^2)，上面的 function 是严格线性的 O(N)。 Here's the discussion where O(N^2) number came from: What's the computational cost of count operation on strings Python?以下是 O(N^2) 数来自的讨论：对字符串 Python 进行计数操作的计算成本是多少？

Also, you may always do this manually, without using split() , just loop over the string, reset counter (once saved counter // n somewhere) on 1 and increase counter on 0 .此外，您可以始终手动执行此操作，而不使用split() ，只需遍历字符串，将计数器（一旦保存counter // n某处）重置为1并增加计数器0 。 This would definitely beat any other approach because strictly O(N).这肯定会击败任何其他方法，因为严格来说是 O(N)。

Finally, for relatively large values of n (n > 10?), there might be a sub-linear (or still linear, but with a smaller constant) algorithm, which starts with comparing a[n-1] to 0 , and going back to beginning.最后，对于相对较大的n值（n > 10？），可能存在一个亚线性（或仍然是线性，但具有更小的常数）算法，该算法首先将a[n-1]与0进行比较，然后继续回到开始。 Chances are, there going to be a 1 somewhere, so we don't have to analyse the beginning of the string if a[n-1] is 1 -- simply because there's no way to fit enough zeros in there.很有可能，某处会有一个1 ，所以如果a[n-1]为1 ，我们就不必分析字符串的开头——仅仅是因为那里没有办法容纳足够多的零。 Assuming we have found 1 at position k , the next position to compare would be a[k+n-1] , again going back to the beginning of the string.假设我们在 position k找到1 ，下一个要比较的 position 将是a[k+n-1] ，再次回到字符串的开头。

This way we can effectively skip most of the string during the search.这样我们可以在搜索过程中有效地跳过大部分字符串。

Answer 2

lenik posted a very good response that worked well. lenik 发布了一个非常好的回复，效果很好。 I also found another method faster than count() that I will post here as well.我还发现了另一种比 count() 更快的方法，我也会在这里发布。 It uses the findall() method from the regex library:它使用正则表达式库中的 findall() 方法：

import re
def my_count(a, n):
  return len(re.findall('0'*n, a))

有没有比 count() 更快的方法来计算字符串中的非重叠出现？

问题描述

2 个解决方案

解决方案1
2 已采纳 2020-05-30 15:49:16

解决方案2
1 2020-05-31 01:42:32

有没有比 count() 更快的方法来计算字符串中的非重叠出现？

问题描述

2 个解决方案

解决方案1 2 已采纳 2020-05-30 15:49:16

解决方案2 1 2020-05-31 01:42:32

解决方案1
2 已采纳 2020-05-30 15:49:16

解决方案2
1 2020-05-31 01:42:32