简体   繁体   English

为C打印扩展ASCII字符有什么好的方法?

[英]What are good methods in printing Extended ASCII characters for C?

First I would like to see what ASCII printable characters in C look like. 首先,我想看看C中ASCII可打印字符的样子。

The following is my code: 以下是我的代码:

#include <stdio.h>
int main(void)
{
    for (char a = 32; a < 127; a++)
        printf("a=%c\n", a); 
    return 0;
}

#include <stdio.h>
int main(void)
{
    for (unsigned char a = 32; a < 127; a++)
        printf("a=%c\n", a); 
    return 0;
}

The above two code snippets work nicely, telling me about ASCII printable characters. 上面的两个代码段可以很好地工作,告诉我有关ASCII可打印字符的信息。

Now next I would like to have a look at Extended ASCII characters for C. 接下来,我想看看C的扩展ASCII字符。

#include <stdio.h>
int main(void)
{
    for (unsigned char a = 128; a < 256; a++)
        printf("a=%c\n", a); 

    return 0;
}

Then comes the endless loop with unknown strange characters. 然后是未知字符未知的无尽循环。

Where did I do wrong? 我在哪里做错了?

I thought loop would stop when a reaches 256, but it didn't. 我以为循环会在a达到256时停止,但事实并非如此。

And where did strange characters come from? 奇怪的字符是从哪里来的?

How can I print Extended ASCII characters for C? 如何为C打印扩展ASCII字符?

You have an infinite loop because the maximum value representable by an unsigned char is 255 1 , and incrementing it past that point causes it to wrap around to zero, so the condition a < 256 is always true. 您有一个无限循环,因为unsigned char可以表示的最大值为255 1 ,并且将其递增到该点之后将导致其回零,因此条件a < 256始终为真。 Your program will work as you intended if you use int instead: 如果使用int则程序将按预期工作:

#include <stdio.h>
int main(void)
{
    for (int a = 128; a < 256; a++)
        printf("a=%c\n", a); 
    return 0;
}

It is perfectly fine to pass an int to printf's %c , 2 as long as its value is in the range representable by unsigned char , which it is. 它是完全没有一个通过int对printf的%c ,只要它的是在由表示的范围内,2个 unsigned char ,这是。

However, if you run this program on a modern computer, you're still likely to get "strange characters". 但是,如果您在现代计算机上运行此程序,则仍然可能会出现“奇怪的字符”。 For instance, when I run it on my computer I get 128 lines of 例如,当我在计算机上运行它时,得到128行

a=�

This is because a modern computer's CLI windows expect UTF-8-encoded Unicode text, and in UTF-8, all the characters above U+007F are encoded using more than one byte. 这是因为现代计算机的CLI窗口要求使用UTF-8编码的Unicode文本,而在UTF-8中,U + 007F上方的所有字符都使用一个以上的字节进行编码。 So the terminal emulator receives what it thinks of as an invalid, incomplete byte sequence one each line, and it prints a special "replacement character" for them. 因此,终端仿真器每行接收一个它认为无效,不完整的字节序列,并为它们打印一个特殊的“替换字符”。 The simplest way to see the actual characters in the U+0080..U+00FF range is to use C's "wide characters": 查看U + 0080..U + 00FF范围内实际字符的最简单方法是使用C的“宽字符”:

#include <wchar.h>
#include <locale.h>
int main(void)
{
    setlocale(LC_ALL, "");
    for (int a = 128; a < 256; a++)
        wprintf(L"U+%04X = '%lc'\n", a, (wchar_t)a);
    return 0;
}

wprintf takes care of converting from wide characters to whatever text encoding the environment expects. wprintf负责将宽字符转换为环境期望的任何文本编码。 This is not guaranteed to work, because C's "wide characters" are underspecified and ill-designed to the point where I actually recommend people do not use them in production code (instead, use exclusively narrow strings holding UTF-8), but for a test program like this you can usually get away with it. 这是不能保证工作,因为C的“宽字符”被得以确认和设计不良的地方,我实际上建议人们不要在生产代码中使用它们(而应使用专门窄弦控股UTF-8)的地步,但对于像这样的测试程序,通常可以摆脱它。 I get output like this: 我得到这样的输出:

U+0080 = ''
U+0081 = ''
U+0082 = ''
...
U+00A0 = ' '
U+00A1 = '¡'
U+00A2 = '¢'
...
U+00FD = 'ý'
U+00FE = 'þ'
U+00FF = 'ÿ'

You could get something different, if your computer is insufficiently modern. 如果您的计算机不够现代,则可能会有所不同。 The U+0080..U+009F range is yet more useless control characters, which is why those are not showing anything. U + 0080..U + 009F范围是更多无用的控制字符,这就是为什么这些字符什么都没有显示的原因。


1 Technically [0, 255] is the minimum required range for unsigned char ; 1从技术上讲[0,255]是unsigned char最小要求范围; the C standard allows for the possibility that it can represent a larger range, eg [0, 511]. C标准允许它代表更大的范围,例如[0,511]。 If you had run your program on a computer where unsigned char had that range, it would have worked. 如果您在unsigned char具有该范围的计算机上运行程序,则可以正常运行。 However, no one has manufactured such a computer in many years. 但是,多年来没有人制造过这样的计算机。 If you really want to worry about it, include <limits.h> and verify that CHAR_BIT is 8 and/or that UCHAR_MAX is 255. 如果您真的要担心它,请包含<limits.h>并验证CHAR_BIT为8和/或UCHAR_MAX为255。

2 Technically, thanks to a vestigial feature of C called "default argument promotion", you always pass an int to %c , even if the variable you supply has a character type. 2从技术上讲,由于C的残留功能称为“默认参数提升”,因此即使提供的变量具有字符类型,也始终int传递给%c

This 这个

a < 256

is always true as unsigned char valid range is <0, 255>. 始终为true因为unsigned char有效范围为<0,255>。

The loop 循环

for (unsigned char a = 128; a < 256; a++)

runs forever on your platform since 255 + 1 is 0 due to wrap around of an unsigned type. 由于unsigned类型的回绕 ,因此255 + 1为0,因此它将永远在您的平台上运行。 You could use the confusing (when you see it for the first time) 可能会感到困惑(当您第一次看到它时)

for (unsigned char a = 128; a >= 128; a++)

What gets printed to your console will depend on the encoding your system uses ( probably ASCII), along with how your terminal prints characters in that range. 打印到控制台的内容取决于系统使用的编码可能是 ASCII),以及终端如何打印该范围内的字符。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM