简体   繁体   English

将字符串作为指针或文字传递时,strcmp()返回值不一致

[英]Inconsistent strcmp() return value when passing strings as pointers or as literals

I was playing around with strcmp when I noticed this, here is the code: 当我注意到这个时,我正在玩strcmp ,这里是代码:

#include <string.h>
#include <stdio.h>

int main(){

    //passing strings directly
    printf("%d\n", strcmp("ahmad", "fatema"));

    //passing strings as pointers 
    char *a= "ahmad";
    char *b= "fatema";
    printf("%d\n",strcmp(a,b));

    return 0;

}

the output is: 输出是:

-1
-5

shouldn't strcmp work the same? 不应该strcmp工作相同? Why is it that I am given different value when I pass strings as "ahmad" or as char* a = "ahmad" . 为什么当我将字符串作为"ahmad"char* a = "ahmad"传递给我时,我被赋予不同的值。 When you pass values to a function they are allocated in its stack right? 将值传递给函数时,它们是否在其堆栈中分配?

You are most likely seeing the result of a compiler optimization. 您很可能会看到编译器优化的结果。 If we test the code using gcc on godbolt , with -O0 optimization level, we can see for the first case it does not call strcmp : 如果我们在godbolt上使用gcc测试代码 ,使用-O0优化级别,我们可以看到第一种情况它不调用strcmp

movl    $-1, %esi   #,
movl    $.LC0, %edi #,
movl    $0, %eax    #,
call    printf  #

Since your are using constants as arguments to strcmp the compiler is able for perform constant folding and call a compiler intrinsic at compile time and generate the -1 then, instead of having to call strcmp at run-time which is implemented in the standard library and will have a different implementation then a likely more simple compile time strcmp . 由于你使用常量作为strcmp的参数,编译器能够执行常量折叠并在编译时调用编译器内在函数并生成-1 ,而不必在运行时调用strcmp ,这是在标准库中实现的将有一个不同的实现,然后可能更简单的编译时间strcmp

In the second case it does generate a call to strcmp : 在第二种情况下,它确实生成对strcmp的调用:

call    strcmp  #
movl    %eax, %esi  # D.2047,
movl    $.LC0, %edi #,
movl    $0, %eax    #,
call    printf  #

This is consistent with the fact that gcc has a builtin for strcmp , which is what gcc will use during constant folding. 这与gcc具有strcmp内置的事实是一致的,这是gcc在常量折叠期间将使用的内容。

If we further test using -O1 optimization level or greater gcc is able to fold both cases and the result will be -1 for both cases: 如果我们进一步测试使用-O1优化级别或更高的 gcc能够折叠两种情况,结果将为-1两种情况:

movl    $-1, %esi   #,
movl    $.LC0, %edi #,
xorl    %eax, %eax  #
call    printf  #
movl    $-1, %esi   #,
movl    $.LC0, %edi #,
xorl    %eax, %eax  #
call    printf  #

With more optimizations options turned on the optimizer is able to determine that a and b point to constants known at compile time as well and can also compute the result of strcmp for this case as well during compile time. 通过启用更多优化选项,优化器能够确定ab指向编译时已知的常量,并且还可以在编译期间计算此情况的strcmp结果。

We can confirm that gcc is using builtin function by building with the -fno-builtin flag and observing that a call to strcmp will be generated for all cases. 我们可以通过使用-fno-builtin标志构建并观察将为所有情况生成对strcmp的调用来确认gcc正在使用内置函数。

clang is slightly different in that it does not fold at all using -O0 but will fold at -O1 and above for both. clang略有不同,因为它根本不会使用-O0折叠,但会在-O1和以上折叠。

Note, that any negative result is an entirely conformant, we can see by going to the draft C99 standard section 7.21.4.2 The strcmp function which says ( emphasis mine ): 注意,任何负面结果都是完全一致的,我们可以通过参考草案C99标准部分7.21.4.2 strcmp函数( 强调我的 ):

 int strcmp(const char *s1, const char *s2); 

The strcmp function returns an integer greater than, equal to, or less than zero , accordingly as the string pointed to by s1 is greater than, equal to, or less than the string pointed to by s2. strcmp函数返回一个大于,等于或小于零的整数,因为s1指向的字符串大于,等于或小于 s2指向的字符串

technosurus points out that strcmp is specified to treat the strings as if they were composed of unsigned char , this is covered in C99 under 7.21.1 which says: technosurus指出strcmp被指定为将字符串视为由unsigned char组成,这在C99 7.21.1有所说明:

For all functions in this subclause, each character shall be interpreted as if it had the type unsigned char (and therefore every possible object representation is valid and has a different value). 对于本子条款中的所有函数,每个字符都应解释为它具有unsigned char类型(因此每个可能的对象表示都是有效的并且具有不同的值)。

I think you believe that the value returned by strcmp should somehow depend on the input strings passed to it in a way that is not defined by the function specification. 我认为你相信strcmp返回的值应该以某种方式依赖于传递给它的输入字符串,这种方式不是由函数规范定义的。 This isn't correct. 这是不正确的。 See for instance the POSIX definition: 例如,参见POSIX定义:

http://pubs.opengroup.org/onlinepubs/009695399/functions/strcmp.html http://pubs.opengroup.org/onlinepubs/009695399/functions/strcmp.html

Upon completion, strcmp() shall return an integer greater than, equal to, or less than 0, if the string pointed to by s1 is greater than, equal to, or less than the string pointed to by s2, respectively. 完成后,如果s1指向的字符串分别大于,等于或小于s2指向的字符串,strcmp()将返回大于,等于或小于0的整数。

This is exactly what you are seeing. 这正是你所看到的。 The implementation does not need to make any guarantee about the exact return value - only that is less than zero, equal to zero, or greater than zero as appropriate. 实现不需要对确切的返回值做出任何保证 - 只有在适当的时候小于零,等于零或大于零。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM