[英]What are good methods in printing Extended ASCII characters for C?
First I would like to see what ASCII printable characters in C look like. 首先,我想看看C中ASCII可打印字符的样子。
The following is my code: 以下是我的代码:
#include <stdio.h>
int main(void)
{
for (char a = 32; a < 127; a++)
printf("a=%c\n", a);
return 0;
}
#include <stdio.h>
int main(void)
{
for (unsigned char a = 32; a < 127; a++)
printf("a=%c\n", a);
return 0;
}
The above two code snippets work nicely, telling me about ASCII printable characters. 上面的两个代码段可以很好地工作,告诉我有关ASCII可打印字符的信息。
Now next I would like to have a look at Extended ASCII characters for C. 接下来,我想看看C的扩展ASCII字符。
#include <stdio.h>
int main(void)
{
for (unsigned char a = 128; a < 256; a++)
printf("a=%c\n", a);
return 0;
}
Then comes the endless loop with unknown strange characters. 然后是未知字符未知的无尽循环。
Where did I do wrong? 我在哪里做错了?
I thought loop would stop when a reaches 256, but it didn't. 我以为循环会在a达到256时停止,但事实并非如此。
And where did strange characters come from? 奇怪的字符是从哪里来的?
How can I print Extended ASCII characters for C? 如何为C打印扩展ASCII字符?
You have an infinite loop because the maximum value representable by an unsigned char
is 255 1 , and incrementing it past that point causes it to wrap around to zero, so the condition a < 256
is always true. 您有一个无限循环,因为unsigned char
可以表示的最大值为255 1 ,并且将其递增到该点之后将导致其回零,因此条件a < 256
始终为真。 Your program will work as you intended if you use int
instead: 如果使用int
则程序将按预期工作:
#include <stdio.h>
int main(void)
{
for (int a = 128; a < 256; a++)
printf("a=%c\n", a);
return 0;
}
It is perfectly fine to pass an int
to printf's %c
, 2 as long as its value is in the range representable by unsigned char
, which it is. 它是完全没有一个通过int
对printf的%c
,只要它的值是在由表示的范围内,2个 unsigned char
,这是。
However, if you run this program on a modern computer, you're still likely to get "strange characters". 但是,如果您在现代计算机上运行此程序,则仍然可能会出现“奇怪的字符”。 For instance, when I run it on my computer I get 128 lines of 例如,当我在计算机上运行它时,得到128行
a=�
This is because a modern computer's CLI windows expect UTF-8-encoded Unicode text, and in UTF-8, all the characters above U+007F are encoded using more than one byte. 这是因为现代计算机的CLI窗口要求使用UTF-8编码的Unicode文本,而在UTF-8中,U + 007F上方的所有字符都使用一个以上的字节进行编码。 So the terminal emulator receives what it thinks of as an invalid, incomplete byte sequence one each line, and it prints a special "replacement character" for them. 因此,终端仿真器每行接收一个它认为无效,不完整的字节序列,并为它们打印一个特殊的“替换字符”。 The simplest way to see the actual characters in the U+0080..U+00FF range is to use C's "wide characters": 查看U + 0080..U + 00FF范围内实际字符的最简单方法是使用C的“宽字符”:
#include <wchar.h>
#include <locale.h>
int main(void)
{
setlocale(LC_ALL, "");
for (int a = 128; a < 256; a++)
wprintf(L"U+%04X = '%lc'\n", a, (wchar_t)a);
return 0;
}
wprintf
takes care of converting from wide characters to whatever text encoding the environment expects. wprintf
负责将宽字符转换为环境期望的任何文本编码。 This is not guaranteed to work, because C's "wide characters" are underspecified and ill-designed to the point where I actually recommend people do not use them in production code (instead, use exclusively narrow strings holding UTF-8), but for a test program like this you can usually get away with it. 这是不能保证工作,因为C的“宽字符”被得以确认和设计不良的地方,我实际上建议人们不要在生产代码中使用它们(而应使用专门窄弦控股UTF-8)的地步,但对于像这样的测试程序,通常可以摆脱它。 I get output like this: 我得到这样的输出:
U+0080 = ''
U+0081 = ''
U+0082 = ''
...
U+00A0 = ' '
U+00A1 = '¡'
U+00A2 = '¢'
...
U+00FD = 'ý'
U+00FE = 'þ'
U+00FF = 'ÿ'
You could get something different, if your computer is insufficiently modern. 如果您的计算机不够现代,则可能会有所不同。 The U+0080..U+009F range is yet more useless control characters, which is why those are not showing anything. U + 0080..U + 009F范围是更多无用的控制字符,这就是为什么这些字符什么都没有显示的原因。
1 Technically [0, 255] is the minimum required range for unsigned char
; 1从技术上讲[0,255]是unsigned char
的最小要求范围; the C standard allows for the possibility that it can represent a larger range, eg [0, 511]. C标准允许它代表更大的范围,例如[0,511]。 If you had run your program on a computer where unsigned char
had that range, it would have worked. 如果您在unsigned char
具有该范围的计算机上运行程序,则可以正常运行。 However, no one has manufactured such a computer in many years. 但是,多年来没有人制造过这样的计算机。 If you really want to worry about it, include <limits.h>
and verify that CHAR_BIT
is 8 and/or that UCHAR_MAX
is 255. 如果您真的要担心它,请包含<limits.h>
并验证CHAR_BIT
为8和/或UCHAR_MAX
为255。
2 Technically, thanks to a vestigial feature of C called "default argument promotion", you always pass an int
to %c
, even if the variable you supply has a character type. 2从技术上讲,由于C的残留功能称为“默认参数提升”,因此即使提供的变量具有字符类型,也始终将int
传递给%c
。
This 这个
a < 256
is always true
as unsigned char
valid range is <0, 255>. 始终为true
因为unsigned char
有效范围为<0,255>。
The loop 循环
for (unsigned char a = 128; a < 256; a++)
runs forever on your platform since 255 + 1 is 0 due to wrap around of an unsigned
type. 由于unsigned
类型的回绕 ,因此255 + 1为0,因此它将永远在您的平台上运行。 You could use the confusing (when you see it for the first time) 您可能会感到困惑(当您第一次看到它时)
for (unsigned char a = 128; a >= 128; a++)
What gets printed to your console will depend on the encoding your system uses ( probably ASCII), along with how your terminal prints characters in that range. 打印到控制台的内容取决于系统使用的编码 ( 可能是 ASCII),以及终端如何打印该范围内的字符。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.