为C打印扩展ASCII字符有什么好的方法？

Question

First I would like to see what ASCII printable characters in C look like. 首先，我想看看C中ASCII可打印字符的样子。

The following is my code: 以下是我的代码：

#include <stdio.h>
int main(void)
{
    for (char a = 32; a < 127; a++)
        printf("a=%c\n", a); 
    return 0;
}

#include <stdio.h>
int main(void)
{
    for (unsigned char a = 32; a < 127; a++)
        printf("a=%c\n", a); 
    return 0;
}

The above two code snippets work nicely, telling me about ASCII printable characters. 上面的两个代码段可以很好地工作，告诉我有关ASCII可打印字符的信息。

Now next I would like to have a look at Extended ASCII characters for C. 接下来，我想看看C的扩展ASCII字符。

#include <stdio.h>
int main(void)
{
    for (unsigned char a = 128; a < 256; a++)
        printf("a=%c\n", a); 

    return 0;
}

Then comes the endless loop with unknown strange characters. 然后是未知字符未知的无尽循环。

Where did I do wrong? 我在哪里做错了？

I thought loop would stop when a reaches 256, but it didn't. 我以为循环会在a达到256时停止，但事实并非如此。

And where did strange characters come from? 奇怪的字符是从哪里来的？

How can I print Extended ASCII characters for C? 如何为C打印扩展ASCII字符？

Answer 1

You have an infinite loop because the maximum value representable by an unsigned char is 255 ¹ , and incrementing it past that point causes it to wrap around to zero, so the condition a < 256 is always true. 您有一个无限循环，因为unsigned char可以表示的最大值为255 ¹ ，并且将其递增到该点之后将导致其回零，因此条件a < 256始终为真。 Your program will work as you intended if you use int instead: 如果使用int则程序将按预期工作：

#include <stdio.h>
int main(void)
{
    for (int a = 128; a < 256; a++)
        printf("a=%c\n", a); 
    return 0;
}

It is perfectly fine to pass an int to printf's %c , ² as long as its value is in the range representable by unsigned char , which it is. 它是完全没有一个通过int对printf的%c ，只要它的值是在由表示的范围内^，2个 unsigned char ，这是。

However, if you run this program on a modern computer, you're still likely to get "strange characters". 但是，如果您在现代计算机上运行此程序，则仍然可能会出现“奇怪的字符”。 For instance, when I run it on my computer I get 128 lines of 例如，当我在计算机上运行它时，得到128行

a=�

This is because a modern computer's CLI windows expect UTF-8-encoded Unicode text, and in UTF-8, all the characters above U+007F are encoded using more than one byte. 这是因为现代计算机的CLI窗口要求使用UTF-8编码的Unicode文本，而在UTF-8中，U + 007F上方的所有字符都使用一个以上的字节进行编码。 So the terminal emulator receives what it thinks of as an invalid, incomplete byte sequence one each line, and it prints a special "replacement character" for them. 因此，终端仿真器每行接收一个它认为无效，不完整的字节序列，并为它们打印一个特殊的“替换字符”。 The simplest way to see the actual characters in the U+0080..U+00FF range is to use C's "wide characters": 查看U + 0080..U + 00FF范围内实际字符的最简单方法是使用C的“宽字符”：

#include <wchar.h>
#include <locale.h>
int main(void)
{
    setlocale(LC_ALL, "");
    for (int a = 128; a < 256; a++)
        wprintf(L"U+%04X = '%lc'\n", a, (wchar_t)a);
    return 0;
}

wprintf takes care of converting from wide characters to whatever text encoding the environment expects. wprintf负责将宽字符转换为环境期望的任何文本编码。 This is not guaranteed to work, because C's "wide characters" are underspecified and ill-designed to the point where I actually recommend people do not use them in production code (instead, use exclusively narrow strings holding UTF-8), but for a test program like this you can usually get away with it. 这是不能保证工作，因为C的“宽字符”被得以确认和设计不良的地方，我实际上建议人们不要在生产代码中使用它们（而应使用专门窄弦控股UTF-8）的地步，但对于像这样的测试程序，通常可以摆脱它。 I get output like this: 我得到这样的输出：

U+0080 = ''
U+0081 = ''
U+0082 = ''
...
U+00A0 = ' '
U+00A1 = '¡'
U+00A2 = '¢'
...
U+00FD = 'ý'
U+00FE = 'þ'
U+00FF = 'ÿ'

You could get something different, if your computer is insufficiently modern. 如果您的计算机不够现代，则可能会有所不同。 The U+0080..U+009F range is yet more useless control characters, which is why those are not showing anything. U + 0080..U + 009F范围是更多无用的控制字符，这就是为什么这些字符什么都没有显示的原因。

¹ Technically [0, 255] is the minimum required range for unsigned char ; ¹从技术上讲[0，255]是unsigned char的最小要求范围； the C standard allows for the possibility that it can represent a larger range, eg [0, 511]. C标准允许它代表更大的范围，例如[0，511]。 If you had run your program on a computer where unsigned char had that range, it would have worked. 如果您在unsigned char具有该范围的计算机上运行程序，则可以正常运行。 However, no one has manufactured such a computer in many years. 但是，多年来没有人制造过这样的计算机。 If you really want to worry about it, include <limits.h> and verify that CHAR_BIT is 8 and/or that UCHAR_MAX is 255. 如果您真的要担心它，请包含<limits.h>并验证CHAR_BIT为8和/或UCHAR_MAX为255。

² Technically, thanks to a vestigial feature of C called "default argument promotion", you always pass an int to %c , even if the variable you supply has a character type. ²从技术上讲，由于C的残留功能称为“默认参数提升”，因此即使提供的变量具有字符类型，也始终将int传递给%c 。

Answer 2

This 这个

a < 256

is always true as unsigned char valid range is <0, 255>. 始终为true因为unsigned char有效范围为<0，255>。

Answer 3

The loop 循环

for (unsigned char a = 128; a < 256; a++)

runs forever on your platform since 255 + 1 is 0 due to wrap around of an unsigned type. 由于unsigned类型的回绕，因此255 + 1为0，因此它将永远在您的平台上运行。 You could use the confusing (when you see it for the first time) 您可能会感到困惑（当您第一次看到它时）

for (unsigned char a = 128; a >= 128; a++)

What gets printed to your console will depend on the encoding your system uses ( probably ASCII), along with how your terminal prints characters in that range. 打印到控制台的内容取决于系统使用的编码（ 可能是 ASCII），以及终端如何打印该范围内的字符。

为C打印扩展ASCII字符有什么好的方法？

问题描述

3 个解决方案

解决方案1
2 2018-07-18 14:45:11

解决方案2
1 2018-07-18 12:56:53

解决方案3
0 2018-07-18 12:56:42

为C打印扩展ASCII字符有什么好的方法？

问题描述

3 个解决方案

解决方案1 2 2018-07-18 14:45:11

解决方案2 1 2018-07-18 12:56:53

解决方案3 0 2018-07-18 12:56:42

解决方案1
2 2018-07-18 14:45:11

解决方案2
1 2018-07-18 12:56:53

解决方案3
0 2018-07-18 12:56:42