简体   繁体   English

sizeof字符和strlen字符串不匹配

[英]sizeof character and strlen string mismatch

As per my code, I assume each greek character is stored in 2bytes. 根据我的代码,我假设每个希腊字符以2字节存储。 sizeof returns the size of each character as 4 (ie the sizeof int ) sizeof将每个字符的大小返回为4 (即sizeof int

How does strlen return 16 ? strlen如何返回16 [Making me think each character occupies 2 bytes] (Shouldn't it be 4*8 = 32 ? Since it counts the number of bytes.) [让我觉得每个字符占用2个字节](不应该是4 * 8 = 32?因为它计算字节数。)

Also, how does printf("%c",bigString[i]); 另外, printf("%c",bigString[i]); print each character properly? 正确打印每个字符? Shouldn't it read 1 byte (a char) and then display because of %c , why is the greek character not split in this case. 它不应该读取1个字节(一个字符)然后显示因为%c ,为什么希腊字符在这种情况下不会被拆分。

strcpy(bigString,"ειδικούς");//greek
sLen = strlen(bigString);
printf("Size is %d\n ",sizeof('ε')); //printing for each character similarly
printf("%s is of length %d\n",bigString,sLen);
int k1 = 0 ,k2 = sLen - 2;

for(i=0;i<sLen;i++)
printf("%c",bigString[i]);

Output: 输出:

Size is 4
 ειδικούς is of length 16
ειδικούς
  1. Character literals in C have type int , so sizeof('ε') is the same as sizeof(int) . C中的字符文字的类型为int ,因此sizeof('ε')sizeof(int)相同。 You're playing with fire in this statement, a bit. 你在这个声明中玩火,有点。 'ε' will be a multicharacter literal , which isn't standard, and might come back to bite you. 'ε'将是一个多字符文字 ,这不是标准的,可能会回来咬你。 Be careful with using extensions like this one. 小心使用像这样的扩展。 Clang, for example, won't accept this program with that literal in it. 例如,Clang将不接受该程序中包含的字符。 GCC gives a warning, but will still compile it. GCC发出警告,但仍会编译它。

  2. strlen returns 16, since that's the number of bytes in your string before the null-terminator. strlen返回16,因为这是在null终止符之前的字符串中的字节数。 Your greek characters are all 16 bits long in UTF-8, so your string looks something like: 你的希腊字符在UTF-8中都是16位长,所以你的字符串看起来像:

     c0c0 c1c1 c2c2 c3c3 c4c4 c5c5 c6c6 c7c7 0 

    in memory, where c0c0 , for example, is the two bytes of the first character. 在内存中,例如, c0c0是第一个字符的两个字节。 There is a single null-termination byte in your string. 有一个在你的字符串中的一个空终止字节。

  3. The printf appears to work because your terminal is UTF-8 aware. printf似乎有效,因为您的终端可以识别UTF-8。 You are printing each byte separately, but the terminal is interpreting the first two prints as a single character, and so on. 分别打印每个字节,但终端将前两个打印解释为单个字符,依此类推。 If you change that printf call to: 如果您将该printf调用更改为:

     printf("%d: %02x\\n", i, (unsigned char)bigString[i]); 

    You'll see the byte-by-byte behaviour you're expecting. 您将看到您期望的逐字节行为。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM