简体   繁体   English

关于C中的数组和字符串的困惑

[英]Confusion About Array and String in C

What is the difference between S1 , S2 and S3 ? S1S2S3什么区别?

char S1[6];
S1[0] = 'A';
S1[1] = 'r';
S1[2] = 'r';
S1[3] = 'a';
S1[4] = 'y';

char S2[6] = {'A','r','r','a','y'};

string S3 = "Array";

When I run the program using if (strcmp(a,b) == 0) , where a, b = S1, S2, S3 . 当我使用if (strcmp(a,b) == 0)运行程序时,其中a, b = S1, S2, S3 It shows that S2 and S3 are the same, and S1 and S2 is different. 它表明S2S3是相同的, S1S2是不同的。 Why is this the case?? 这是为什么? Why not all three are equivalent? 为什么不是这三个都是等价的?

And when I add back '\\0' to both S1b , S1c . 当我将'\\0'加回到S1bS1c All 3 are the same. 所有3都是一样的。 This is understandable. 这是可以理解的。

BUT why in my first trial, S2 and S3 are the same then?? 但为什么在我的第一次试验中, S2S3是相同的? I did not include '\\0' too. 我也没有包含'\\0' And I suspect S1 and S2 should be the same, but not S2 and S3 . 我怀疑S1S2应该是相同的,但不是S2S3

Can anyone tell me why my thought is wrong??? 谁能告诉我为什么我的想法是错的???

Thanks for your answers. 谢谢你的回答。 I have tried and changed the settings to the followings: 我已尝试将设置更改为以下内容:

char S1[5];
S1[0] = 'A';
S1[1] = 'r';
S1[2] = 'r';
S1[3] = 'a';
S1[4] = 'y';

char S2[5] = {'A','r','r','a','y'};

string S3 = "Array";

And now clearly S2 and S3 are not the same, since they differs by a '\\0' . 现在显然S2S3不一样,因为它们的差异为'\\0' However, I am still a bit confused why S1 and S2 are not the same this time again if I use strcmp to compare the two? 但是,如果我使用strcmp来比较两者,那么为什么S1S2再次不一样,我仍然有点困惑?

Compare the actual in-memory values of the arrays: 比较数组的实际内存中值:

  1. S1 is 6 elements big, yet you only specify values for 0-5, the 6th element is not explicitly set, so it retains whatever value the memory location had prior to allocation. S1是6个元素大,但你只指定0-5的值,第6个元素没有显式设置,所以它保留了内存位置在分配之前的任何值。
  2. S2 is similar to S1 , only 5 elements are provided, however when using the {,} syntax any extra elements are zeroed. S2类似于S1,仅提供5个元素,但是当使用{,}语法时,任何额外元素都归零。 So char foo[5] = { 1, 2 } is identical to char foo[5] = { 1, 2, 0, 0, 0} . 所以char foo[5] = { 1, 2 }char foo[5] = { 1, 2, 0, 0, 0} char foo[5] = { 1, 2 }相同。
  3. S3 uses the string syntax way of initialising an array, which creates an array of char (or wchar_t ) with an extra element set to \\0 (the null terminator). S3使用字符串语法初始化数组,创建一个char (或wchar_t )数组,其中一个额外的元素设置为\\0 (空终止符)。

Visually: 视觉:

S1 = 0x41, 0x72, 0x72, 0x61, 0x79, 0x??
S2 = 0x41, 0x72, 0x72, 0x61, 0x79, 0x00
S3 = 0x41, 0x72, 0x72, 0x61, 0x79, 0x00

Note that you're running into a safety problem with strcmp : it doesn't have a length parameter, it keeps on searching until it encounters \\0 , which might be never (ie until it causes a segfault or access violation). 请注意,您遇到了strcmp的安全问题:它没有长度参数,它会一直搜索,直到遇到\\0 ,这可能永远不会(即直到它导致段错误或访问冲突)。 Instead use a safer function like strncmp or (if using C++) the std::string type. 而是使用更安全的函数,如strncmp或(如果使用C ++) std::string类型。

It shows that S2 and S3 are the same, and S1 and S2 is different.

S3 contains the nul terminator which S1 does not have. S3包含S1没有的nul终结符。 This string S3 = "Array"; 这个string S3 = "Array"; means 手段

| A | r | r | a | y | \0 |

While S2 is 虽然S2是

| A | r | r | a | y | \0 |

While S1 is 虽然S1是

| A | r | r | a | y | Garbage |

S1 and S2 comparison can lead to UB (i presume) because S1 is not nul-terminated and there is no length which we pass in strcmp . S1和S2比较可以导致UB(我推测),因为S1不是nul终止的,并且没有我们在strcmp传递的长度。

#include <stdio.h>
#include <string.h>

int main(void) 
{
    char S1[6];
    S1[0] = 'A';
    S1[1] = 'r';
    S1[2] = 'r';
    S1[3] = 'a';
    S1[4] = 'y';
    S1[5] = 0;

    char S2[6] = {'A','r','r','a','y', 0};
    printf("%d" ,strcmp(S1,S2));
    return 0;
}

Outputs: 输出:

0

strcmp() function starts comparing the first character of each string. strcmp()函数开始比较每个字符串的第一个字符。 If they are equal to each other, it continues with the following pairs until the characters differ or until a terminating null-character is reached. 如果它们彼此相等,则继续使用以下对,直到字符不同或直到达到终止空字符。

I don't think it is safe to compare S1 and S2 using this. 我不认为使用它来比较S1和S2是安全的。 Input to strcmp is the address of first character. 输入到strcmp是第一个字符的地址。 S1 is not null-terminated. S1不是以空值终止的。 Though 6 bytes are allocated in both cases, S1[5] is not initialised. 虽然在两种情况下都分配了6个字节,但S1 [5]未初始化。 Chances are that they have some garbage value. 有可能他们有一些垃圾价值。 The risk here is that strcmp will end up comparing un-allocated memory also, in the search for character diff or null character. 这里的风险是strcmp最终还会在搜索字符diff或null字符时比较未分配的内存。 This can even lead to seg fault or access violation. 这甚至可能导致seg故障或访问冲突。

Visualising memory alignment of S1,S2,S3 might be something like this 可视化S1,S2,S3的内存对齐可能是这样的

S1 = A | r | r | a | y | ?
S2 = A | r | r | a | y | 0
S3 = A | r | r | a | y | 0

Any comparison between S2 and S3 is safe. S2和S3之间的任何比较都是安全的。 S1 vs S2 or S3 might not be. S1 vs S2或S3可能不是。

Just adding to the existing answers 只需添加现有答案

char S2[6] = {'A','r','r','a','y'};

string S3 = "Array";

Both are NULL terminated and hence strcmp() works well and says that they both are same. 两者都是NULL终止,因此strcmp()运行良好,并说它们都是相同的。 While for S1 the assignment is done explicitly there is no NULL termination for this array. 而对于S1 ,分配是明确完成的,此阵列没有NULL终止。 So this is not a valid string in C. So using strcmp() might lead to undefined behavior. 所以这不是C中的有效字符串。因此使用strcmp()可能会导致未定义的行为。

The point with S3 is that S3 is a string literal which is read-only. S3的要点是S3是一个只读的字符串文字。 Mostly these sort of values are stored in read-only locations. 大多数情况下,这些值存储在只读位置。 So when you try to write something to S3 after initialization you might see a crash.So we should keep this in mind while using assignments like S3 因此,当您在初始化后尝试向S3写入内容时,您可能会看到崩溃。所以我们应该在使用S3分配时牢记这一点。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM