简体   繁体   中英

Exact number of character comparisons in naive exact algorithm

Given a substring and a string, is it possible to calculate the exact number of character comparisons made when running the naive exact algorithm to match the substring to the given string? Assume exact match, no approximate match.

According to many sources (eg, http://www.di.unipi.it/~pisanti/DIDATTICA/patternmatching1.pdf ), it is possible to calculate the worse-case number of comparisons by using Big-Oh notation: O(nm) . Namely, the worse-case is: n(m-n+1) , where n is the length of the substring that is to be matched to the string m .

However, the following source states that there are roughly m comparisons made in the naive exact algorithm: http://www.cs.cornell.edu/courses/cs312/2002sp/lectures/lec25.htm . Note that they use n instead of m in their notation, but we both mean the same thing (I'm just staying consistent with the previous URL link).

In any case, this all got me to wondering whether it is possible to calculate exactly how many character comparisons are made when running the naive exact algorithm. If we can know the worst-case and we can guess at roughly how many character comparisons are made approximately, surely there must be a way to calculate exactly how many character comparisons are made.

Assuming that the search is performed with an outer loop on the string length and an inner loop on the substring length, you will perform

  • if the search succeeds at the I -th position, exactly NI comparisons ( 1≤I≤M-N+1 );

  • if the search fails, exactly ΣJk comparisons, where the Jk 's are the numbers of matching characters in the substring prefixes, plus one ( 1≤Jk≤N ).

As said, the worst case is N(M-N+1) , when all possible comparisons are made. The best case is the minimum of N , when the substring is found in the first position, and M-N+1 , when all substring comparisons immediately fail.

Assuming that the probability of a failure is q and that of a success p , with all positions and all matching prefix lengths being equiprobable (if this is possible), the expected number is

p.N(M-N+2)/2 + q.(N+1)(M-N+1)/2 = N(M-N+2)/2 + q(M-1)/2.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM