为什么这个C ++字符串长度计算功能比另一个更快？

Question

Our lecturer has explained that this piece of function which calculates the length of a character string... 我们的讲师解释说，此函数可计算字符串的长度...

int strlen_1(const char *str) {
    const char *temp = str;
    while(*temp != '\0') {
        temp++;
    }
    return temp - str;
}

... will calculate it faster than this one... ...将比此计算速度更快...

int strlen_03(const char *str) {
    int i;
    for (i = 0; *(str+i) != '\0'; i++);
    return i;

I think he said that it was related to arithmetic calculations, something like in the first one any arithmetic calculus is done, but I cannot understand that, I see them both in the same level. 我认为他说这与算术计算有关，就像第一个算术演算完成一样，但是我不明白，我看到它们都处于同一水平。 Could you please explain me the reason in other words? 换句话说，能否请您解释一下原因？

PS. PS。 I do understand pointers and I can understand what is going on, it is like stepping through the elements of the array stored into the "RAM cells" by one unit. 我了解指针，也了解发生了什么，就像一步一步地浏览存储在“ RAM单元”中的数组元素一样。

Thanks in advance. 提前致谢。

Answer 1

Ignoring the optimization for a moment and just looking at the paper algorithms: 暂时忽略优化，只看一下纸上的算法：

The former performs this calculation repeatedly: 前者反复执行此计算：

addr++

with a result calculated by a difference calculation 通过差异计算得出的结果

addr1 - addr0

the latter performs these calculations repeated 后者重复执行这些计算

addr0 + i
i++

with the result being calculated by a value-return 结果是通过值返回来计算的

In other words, twice as much work is being done in the loop for the benefit of doing half as much in calculating the final result. 换句话说，为了完成最终结果的一半工作，在循环中完成了两倍的工作。

getting to the optimized ASM, the first generates this at -O3 on my clang: 进入优化的ASM，第一个在我的Clang的-O3处生成此代码：

0x100000ee4:  cmpb   $0, 1(%rbx)
0x100000ee8:  leaq   1(%rbx), %rbx
0x100000eec:  sete   %al
0x100000eef:  testb  $1, %al
0x100000ef1:  je     0x100000ee4

the second generates this: 第二个生成此：

0x100000f09:  incl   %ebx
0x100000f0b:  cmpb   $0, (%rax)
0x100000f0e:  leaq   1(%rax), %rax
0x100000f12:  sete   %cl
0x100000f15:  testb  $1, %cl
0x100000f18:  je     0x100000f09

I left out the constant one-timers for the return values because they are not core to the complexity of the loop. 我没有为返回值设置恒定的一次定时器，因为它们不是循环复杂性的核心。 The optimizer is pretty good, noting the only major difference is that single: 优化器非常好，注意到唯一的主要区别在于单个：

0x100000f09:  incl   %ebx

which is your i 这是你i

Answer 2

This is a micro-optimization, and a modern compiler would likely end up generating the same assembly for both, but for a non-optimized version, here is why: 这是一个微优化，现代的编译器可能最终会为两者生成相同的程序集，但是对于未优化的版本，原因如下：

int strlen_1(const char *str) 
{
    const char *temp = str; // declare the iterator
    while(*temp != '\0')   // dereference the pointer
                           // test the iterator 
    {
        temp++; // increment the iterator
    }
    return temp - str; // pointer subtraction
}

For a string of length N, this gives you 3N + 2 operations. 对于长度为N的字符串，这将为您提供3N + 2个运算。

int strlen_03(const char *str) 
{
    int i; // declare your iterator
    for (i = 0; *(str+i) != '\0'; i++); // initialize the iterator
                                        // add i to str
                                        // dereference that pointer value
                                        // test it against \0
                                        // increment i
    return i;
}

For the same string, this gives you 4N + 2 operations. 对于相同的字符串，这将为您提供4N + 2个运算。

Again, a modern compiler will likely fix this for you, and this small loop isn't likely to make much of a difference even in the un-optimized form for most strings (only for very long strings). 同样，现代的编译器很可能会为您解决此问题，即使对于大多数字符串（仅对于非常长的字符串），即使是未优化的形式，这种小的循环也不会产生太大的变化。

为什么这个C ++字符串长度计算功能比另一个更快？

问题描述

2 个解决方案

解决方案1
6 已采纳 2014-02-09 03:42:38

解决方案2
5 2014-02-09 03:36:10

为什么这个C ++字符串长度计算功能比另一个更快？

问题描述

2 个解决方案

解决方案1 6 已采纳 2014-02-09 03:42:38

解决方案2 5 2014-02-09 03:36:10

解决方案1
6 已采纳 2014-02-09 03:42:38

解决方案2
5 2014-02-09 03:36:10