简体   繁体   中英

Is there a faster way to count non-overlapping occurrences in a string than count()?

Given a minimum length N and a string S of 1's and 0's (eg "01000100"), I am trying to return the number of non-overlapping occurrences of a sub-string of length n containing all '0's. For example, given n=2 and the string "01000100", the number of non-overlapping "00"s is 2.

This is what I have done:

def myfunc(S,N):
  return S.count('0'*N)

My question: is there a faster way of performing this for very long strings? This is from an online coding practice site and my code passes all but one of the test cases, which fails due to not being able to finish within a time limit. Doing some research it seems I can only find that count() is the fastest method for this.

This might be faster:

>>> s = "01000100"
>>> def my_count( a, n ) :
...     parts = a.split('1')
...     return sum( len(p)//n for p in parts )
... 
>>> my_count(s, 2)
2
>>> 

Worst case scenario for count() is O(N^2), the function above is strictly linear O(N). Here's the discussion where O(N^2) number came from: What's the computational cost of count operation on strings Python?


Also, you may always do this manually, without using split() , just loop over the string, reset counter (once saved counter // n somewhere) on 1 and increase counter on 0 . This would definitely beat any other approach because strictly O(N).


Finally, for relatively large values of n (n > 10?), there might be a sub-linear (or still linear, but with a smaller constant) algorithm, which starts with comparing a[n-1] to 0 , and going back to beginning. Chances are, there going to be a 1 somewhere, so we don't have to analyse the beginning of the string if a[n-1] is 1 -- simply because there's no way to fit enough zeros in there. Assuming we have found 1 at position k , the next position to compare would be a[k+n-1] , again going back to the beginning of the string.

This way we can effectively skip most of the string during the search.

lenik posted a very good response that worked well. I also found another method faster than count() that I will post here as well. It uses the findall() method from the regex library:

import re
def my_count(a, n):
  return len(re.findall('0'*n, a))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM