简体   繁体   English

有没有比 count() 更快的方法来计算字符串中的非重叠出现?

[英]Is there a faster way to count non-overlapping occurrences in a string than count()?

Given a minimum length N and a string S of 1's and 0's (eg "01000100"), I am trying to return the number of non-overlapping occurrences of a sub-string of length n containing all '0's.给定一个最小长度 N 和一个 1 和 0 的字符串 S(例如“01000100”),我试图返回一个包含所有 '0' 的长度为 n 的子字符串的非重叠出现次数。 For example, given n=2 and the string "01000100", the number of non-overlapping "00"s is 2.例如,给定 n=2 和字符串“01000100”,不重叠的“00”的数量为 2。

This is what I have done:这就是我所做的:

def myfunc(S,N):
  return S.count('0'*N)

My question: is there a faster way of performing this for very long strings?我的问题:有没有更快的方法来处理很长的字符串? This is from an online coding practice site and my code passes all but one of the test cases, which fails due to not being able to finish within a time limit.这是来自在线编码练习网站,我的代码通过了除一个测试用例之外的所有测试用例,由于无法在时间限制内完成而失败。 Doing some research it seems I can only find that count() is the fastest method for this.做一些研究似乎我只能发现 count() 是最快的方法。

This might be faster:这可能会更快:

>>> s = "01000100"
>>> def my_count( a, n ) :
...     parts = a.split('1')
...     return sum( len(p)//n for p in parts )
... 
>>> my_count(s, 2)
2
>>> 

Worst case scenario for count() is O(N^2), the function above is strictly linear O(N). count()的最坏情况是 O(N^2),上面的 function 是严格线性的 O(N)。 Here's the discussion where O(N^2) number came from: What's the computational cost of count operation on strings Python?以下是 O(N^2) 数来自的讨论: 对字符串 Python 进行计数操作的计算成本是多少?


Also, you may always do this manually, without using split() , just loop over the string, reset counter (once saved counter // n somewhere) on 1 and increase counter on 0 .此外,您可以始终手动执行此操作,而不使用split() ,只需遍历字符串,将计数器(一旦保存counter // n某处)重置为1并增加计数器0 This would definitely beat any other approach because strictly O(N).这肯定会击败任何其他方法,因为严格来说是 O(N)。


Finally, for relatively large values of n (n > 10?), there might be a sub-linear (or still linear, but with a smaller constant) algorithm, which starts with comparing a[n-1] to 0 , and going back to beginning.最后,对于相对较大的n值(n > 10?),可能存在一个亚线性(或仍然是线性,但具有更小的常数)算法,该算法首先将a[n-1]0进行比较,然后继续回到开始。 Chances are, there going to be a 1 somewhere, so we don't have to analyse the beginning of the string if a[n-1] is 1 -- simply because there's no way to fit enough zeros in there.很有可能,某处会有一个1 ,所以如果a[n-1]1 ,我们就不必分析字符串的开头——仅仅是因为那里没有办法容纳足够多的零。 Assuming we have found 1 at position k , the next position to compare would be a[k+n-1] , again going back to the beginning of the string.假设我们在 position k找到1 ,下一个要比较的 position 将是a[k+n-1] ,再次回到字符串的开头。

This way we can effectively skip most of the string during the search.这样我们可以在搜索过程中有效地跳过大部分字符串。

lenik posted a very good response that worked well. lenik 发布了一个非常好的回复,效果很好。 I also found another method faster than count() that I will post here as well.我还发现了另一种比 count() 更快的方法,我也会在这里发布。 It uses the findall() method from the regex library:它使用正则表达式库中的 findall() 方法:

import re
def my_count(a, n):
  return len(re.findall('0'*n, a))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM