简体   繁体   中英

What are good methods in printing Extended ASCII characters for C?

First I would like to see what ASCII printable characters in C look like.

The following is my code:

#include <stdio.h>
int main(void)
{
    for (char a = 32; a < 127; a++)
        printf("a=%c\n", a); 
    return 0;
}

#include <stdio.h>
int main(void)
{
    for (unsigned char a = 32; a < 127; a++)
        printf("a=%c\n", a); 
    return 0;
}

The above two code snippets work nicely, telling me about ASCII printable characters.

Now next I would like to have a look at Extended ASCII characters for C.

#include <stdio.h>
int main(void)
{
    for (unsigned char a = 128; a < 256; a++)
        printf("a=%c\n", a); 

    return 0;
}

Then comes the endless loop with unknown strange characters.

Where did I do wrong?

I thought loop would stop when a reaches 256, but it didn't.

And where did strange characters come from?

How can I print Extended ASCII characters for C?

You have an infinite loop because the maximum value representable by an unsigned char is 255 1 , and incrementing it past that point causes it to wrap around to zero, so the condition a < 256 is always true. Your program will work as you intended if you use int instead:

#include <stdio.h>
int main(void)
{
    for (int a = 128; a < 256; a++)
        printf("a=%c\n", a); 
    return 0;
}

It is perfectly fine to pass an int to printf's %c , 2 as long as its value is in the range representable by unsigned char , which it is.

However, if you run this program on a modern computer, you're still likely to get "strange characters". For instance, when I run it on my computer I get 128 lines of

a=�

This is because a modern computer's CLI windows expect UTF-8-encoded Unicode text, and in UTF-8, all the characters above U+007F are encoded using more than one byte. So the terminal emulator receives what it thinks of as an invalid, incomplete byte sequence one each line, and it prints a special "replacement character" for them. The simplest way to see the actual characters in the U+0080..U+00FF range is to use C's "wide characters":

#include <wchar.h>
#include <locale.h>
int main(void)
{
    setlocale(LC_ALL, "");
    for (int a = 128; a < 256; a++)
        wprintf(L"U+%04X = '%lc'\n", a, (wchar_t)a);
    return 0;
}

wprintf takes care of converting from wide characters to whatever text encoding the environment expects. This is not guaranteed to work, because C's "wide characters" are underspecified and ill-designed to the point where I actually recommend people do not use them in production code (instead, use exclusively narrow strings holding UTF-8), but for a test program like this you can usually get away with it. I get output like this:

U+0080 = ''
U+0081 = ''
U+0082 = ''
...
U+00A0 = ' '
U+00A1 = '¡'
U+00A2 = '¢'
...
U+00FD = 'ý'
U+00FE = 'þ'
U+00FF = 'ÿ'

You could get something different, if your computer is insufficiently modern. The U+0080..U+009F range is yet more useless control characters, which is why those are not showing anything.


1 Technically [0, 255] is the minimum required range for unsigned char ; the C standard allows for the possibility that it can represent a larger range, eg [0, 511]. If you had run your program on a computer where unsigned char had that range, it would have worked. However, no one has manufactured such a computer in many years. If you really want to worry about it, include <limits.h> and verify that CHAR_BIT is 8 and/or that UCHAR_MAX is 255.

2 Technically, thanks to a vestigial feature of C called "default argument promotion", you always pass an int to %c , even if the variable you supply has a character type.

This

a < 256

is always true as unsigned char valid range is <0, 255>.

The loop

for (unsigned char a = 128; a < 256; a++)

runs forever on your platform since 255 + 1 is 0 due to wrap around of an unsigned type. You could use the confusing (when you see it for the first time)

for (unsigned char a = 128; a >= 128; a++)

What gets printed to your console will depend on the encoding your system uses ( probably ASCII), along with how your terminal prints characters in that range.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM