简体   繁体   English

为什么str.count('')和len(str)给出不同的输出?

[英]Why are str.count('') and len(str) giving different output?

Look at following code and please explain why the str.count('') method and len(str) function is giving two different outputs. 查看以下代码,并请解释为什么str.count('')方法和len(str)函数给出两个不同的输出。

a=''
print(len(a))
print(a.count(''))

Output: 输出:

0
1

str.count() counts non-overlapping occurrences of the substring: str.count()计算子字符串的非重叠出现次数:

Return the number of non-overlapping occurrences of substring sub . 返回substring sub的不重叠出现的次数。

There is exactly one such place where the substring '' occurs in the string '' : right at the start. 恰好有一个这样的地方,子串''出现在字符串'' :”的开头。 So the count should return 1 . 因此,计数返回1

Generally speaking, the empty string will match at all positions in a given string, including right at the start and end, so the count should always be the length plus 1: 一般来说,空字符串将匹配给定字符串中的所有位置 ,包括开始和结尾处的正确位置 ,因此计数应始终为长度加1:

>>> (' ' * 100).count('')
101

That's because empty strings are considered to exist between all the characters of a string; 这是因为空字符串被认为存在于字符串的所有字符之间。 for a string length 2, there are 3 empty strings; 对于字符串长度2,有3个空字符串; one at the start, one between the two characters, and one at the end. 开头是一个,两个字符之间是一个,结尾是一个。

So yes, the results are different and they are entirely correct. 是的,结果是不同的,而且它们是完全正确的。

.count('') counts the number of locations of zero-length strings. .count('')计算零长度字符串的位置数。 You could also think of this as the number of possible cursor positions. 您也可以将其视为可能的光标位置数。

"test".count('')

 t e s t
^ ^ ^ ^ ^

Instead of counting the number of characters (like len(str) ), you're counting the number of anti-characters. 而不是计算字符数(例如len(str) ),而是计算反字符数。

Documentation : 说明文件

Return the number of non-overlapping occurrences of subsequence sub in the range [start, end]. 返回范围为[start,end]的子序列sub的非重叠出现次数。 Optional arguments start and end are interpreted as in slice notation. 可选参数start和end解释为切片表示法。

If we have a look at the implementation , we find that it call the function stringlib_count ( source ) which just goes on and return the length of the string plus one, when the length of the sub is zero: 如果看一下实现 ,我们会发现它调用了函数stringlib_countsource ),该函数继续运行,并且当子的长度为零时,返回字符串的长度加一:

if (sub_len == 0)
    return (str_len < maxcount) ? str_len + 1 : maxcount;

( source ) 来源

Note : maxcount is set to largest positive value of size_t . 注意maxcount设置为size_t的最大正值。


Of course, that is just a short cirtcuit. 当然,这只是一个简短的提示。 If we skip that check, the code goes on to call FASTSEARCH . 如果我们跳过该检查,代码将继续调用FASTSEARCH

How is FASTSHEARCH implemented ? FASTSHEARCH如何实施 It goes on a loop , checking for every position if the string matches the sub at that position. 它进行循环 ,检查每个位置是否字符串与该位置的子匹配。

Since it is looking for an empty string, it will say that it matches in every position (at every position, it finds no characters that differ, up to the length of the sub). 由于它正在寻找一个空字符串,因此它将说它在每个位置都匹配(在每个位置,它都找不到不相同的字符,直到子程序的长度为止)。

Remember that it is looking in the inclusive range from start to end. 请记住,它从头到尾都在包含范围内。 Meaning that it will look in every position in the string, that is: 这意味着它将在字符串的每个位置查找,即:

  • The start (before the first character) 开始(第一个字符之前)
  • Between each character pair (after each character, before the next one) 在每个字符对之间(在每个字符之后,在下一个字符之前)
  • The end (after the last character) 结束(最后一个字符之后)

That is one position per character (before each character) plus one (the end). 那是每个字符一个位置(在每个字符之前)加一个位置(结尾)。 Or if you prefer, it is one position per character (after each character) plus one (the start). 或者,如果愿意,它是每个字符一个位置(在每个字符之后)加一个位置(开始)。 In either case, it will return the length of the string plus one. 无论哪种情况,它都将返回字符串的长度加一。 The developers short circuited it to avoid doing the loop. 开发人员将其短路以避免循环。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM