简体   繁体   中英

Why are str.count('') and len(str) giving different output?

Look at following code and please explain why the str.count('') method and len(str) function is giving two different outputs.

a=''
print(len(a))
print(a.count(''))

Output:

0
1

str.count() counts non-overlapping occurrences of the substring:

Return the number of non-overlapping occurrences of substring sub .

There is exactly one such place where the substring '' occurs in the string '' : right at the start. So the count should return 1 .

Generally speaking, the empty string will match at all positions in a given string, including right at the start and end, so the count should always be the length plus 1:

>>> (' ' * 100).count('')
101

That's because empty strings are considered to exist between all the characters of a string; for a string length 2, there are 3 empty strings; one at the start, one between the two characters, and one at the end.

So yes, the results are different and they are entirely correct.

.count('') counts the number of locations of zero-length strings. You could also think of this as the number of possible cursor positions.

"test".count('')

 t e s t
^ ^ ^ ^ ^

Instead of counting the number of characters (like len(str) ), you're counting the number of anti-characters.

Documentation :

Return the number of non-overlapping occurrences of subsequence sub in the range [start, end]. Optional arguments start and end are interpreted as in slice notation.

If we have a look at the implementation , we find that it call the function stringlib_count ( source ) which just goes on and return the length of the string plus one, when the length of the sub is zero:

if (sub_len == 0)
    return (str_len < maxcount) ? str_len + 1 : maxcount;

( source )

Note : maxcount is set to largest positive value of size_t .


Of course, that is just a short cirtcuit. If we skip that check, the code goes on to call FASTSEARCH .

How is FASTSHEARCH implemented ? It goes on a loop , checking for every position if the string matches the sub at that position.

Since it is looking for an empty string, it will say that it matches in every position (at every position, it finds no characters that differ, up to the length of the sub).

Remember that it is looking in the inclusive range from start to end. Meaning that it will look in every position in the string, that is:

  • The start (before the first character)
  • Between each character pair (after each character, before the next one)
  • The end (after the last character)

That is one position per character (before each character) plus one (the end). Or if you prefer, it is one position per character (after each character) plus one (the start). In either case, it will return the length of the string plus one. The developers short circuited it to avoid doing the loop.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM