简体   繁体   English

对于包含土耳其语字符的字符串,C printf 函数无法正确对齐字符串

[英]C printf function does not align strings correctly for strings that contain Turkish characters

I have the following code to print some strings on the console in a left aligned format:我有以下代码以左对齐格式在控制台上打印一些字符串:

#include <stdio.h>
#include <locale.h>
#include <stdlib.h>

int main()
{
    printf("%s:\n", "Türkçe karakterler ile");
    printf("%-14s: \n", "Onaltılık");
    printf("%-14s: \n", "Onluk");
    printf("%-14s: \n", "İkilik");

    printf("\n%s:\n", "Türkçe karakterler olmadan");
    printf("%-14s: \n", "Onaltilik");
    printf("%-14s: \n", "Onluk");
    printf("%-14s: \n", "Ikilik");
}

I compiled this code with both gcc(7.3.0) and clang(6.0.0) on a Ubuntu 18.04 system.我在 Ubuntu 18.04 系统上用 gcc(7.3.0) 和 clang(6.0.0) 编译了这段代码。

The output is as follows:输出如下:

Türkçe karakterler ile:
Onaltılık  : 
Onluk        : 
İkilik      : 

Türkçe karakterler olmadan:
Onaltilik     : 
Onluk         : 
Ikilik        :

As can be seen from the code in the first group of strings there are some Turkish characters such as 'ı' and 'İ'.从第一组字符串的代码中可以看出,有一些土耳其语字符,例如 'ı' 和 'İ'。 There is no Turkish characters in the second group of strings.第二组字符串中没有土耳其语字符。

The output of printf function is not correctly aligned for the strings that contain Turkish characters.对于包含土耳其语字符的字符串,printf 函数的输出没有正确对齐。 The expected output is:预期的输出是:

Türkçe karakterler ile:
Onaltılık     : 
Onluk         : 
İkilik        : 

Türkçe karakterler olmadan:
Onaltilik     : 
Onluk         : 
Ikilik        :

If I compile same code on a Windows system (Windows 7) with gcc (MinGW v5.1.1 inside CodeBlocks 17.2) the output is correct as follows:如果我在 Windows 系统(Windows 7)上使用 gcc(CodeBlocks 17.2 中的 MinGW v5.1.1)编译相同的代码,输出正确如下:

Türkçe karakterler ile:
Onaltılık     :
Onluk         :
İkilik        :

Türkçe karakterler olmadan:
Onaltilik     :
Onluk         :
Ikilik        :

Can anyone help me to figure out what the problem is?谁能帮我弄清楚问题是什么?

My guess is it's because your editor saved the source using UTF-8, which is a multi-byte encoding.我的猜测是因为您的编辑器使用 UTF-8 保存了源代码,这是一种多字节编码。 The printf family of functions only deals with byte strings. printf系列函数只处理字节字符串。 That means every non-ASCII character will be counted as multiple characters by printf .这意味着每个非 ASCII 字符都会被printf算作多个字符。

If it's like that you can work around the problem by printing the string, and then adding padding manually after using the * modifier when printing an empty string.如果是这样,您可以通过打印字符串来解决该问题,然后在打印空字符串时使用*修饰符后手动添加填充。 The * modifier allows you to pass the width as an argument to printf . *修饰符允许您将宽度作为参数传递给printf

Something like this:像这样的东西:

printf("%s%*s: \n", "Onaltılık", 5, "");  // 5 = 14 - 9, where 9 is the number of "characters" in Onaltılık
printf("%s%*s: \n", "Onluk"    , 9, "");  // Dito for Onluk
printf("%s%*s: \n", "İkilik"   , 8, "");  // Dito for İkilik

Output:输出:

Onaltılık     : 
Onluk         : 
İkilik        :

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM