简体   繁体   English

如何检查语言环境是否为UTF-8?

[英]How to check if a locale is UTF-8?

I'm working with Yocto to create an embedded linux distribution for an ARM device (i.MX 6Quad Processors). 我正在与Yocto合作为ARM设备(i.MX 6Quad处理器)创建嵌入式linux发行版。

I've configured the list of desired locales with the variable: 我已经使用变量配置了所需语言环境的列表:

IMAGE_LINGUAS = "de-de fr-fr en-gb en-gb.iso-8859-1 en-us en-us.iso-8859-1 zh-cn"

As result I've obtained a file systems that contains the following folders: 结果,我获得了一个包含以下文件夹的文件系统:

root@lam_icu:/usr/lib/locale# cd /usr/share/locale/
root@lam_icu:/usr/share/locale# ls -la
total 0
drwxr-xr-x  6 root root  416 Nov 17  2016 .
drwxr-xr-x 30 root root 2056 Nov 17  2016 ..
drwxr-xr-x  4 root root  296 Nov 17  2016 de
drwxr-xr-x  3 root root  232 Nov 17  2016 en_GB
drwxr-xr-x  4 root root  296 Nov 17  2016 fr
drwxr-xr-x  4 root root  296 Nov 17  2016 zh_CN

and: 和:

root@lam_icu:/usr/share/locale# cd /usr/lib/locale/
root@lam_icu:/usr/lib/locale# ls -la
total 0
drwxr-xr-x  9 root root   640 Mar 13  2017 .
drwxr-xr-x 32 root root 40000 Mar 13  2017 ..
drwxr-xr-x  3 root root  1016 Mar 13  2017 de_DE
drwxr-xr-x  3 root root  1016 Mar 13  2017 en_GB
drwxr-xr-x  3 root root  1016 Mar 13  2017 en_GB.ISO-8859-1
drwxr-xr-x  3 root root  1016 Mar 13  2017 en_US
drwxr-xr-x  3 root root  1016 Mar 13  2017 en_US.ISO-8859-1
drwxr-xr-x  3 root root  1016 Mar 13  2017 fr_FR
drwxr-xr-x  3 root root  1016 Mar 13  2017 zh_CN

Which is the encoding of all non ISO-8859-1 locales? 所有非ISO-8859-1语言环境的编码是什么? Can I assume that "en_GB" or "en_US" use the UTF-8 encoding? 我可以假定“ en_GB”或“ en_US”使用UTF-8编码吗?

I've tried to open the "LC_IDENTIFICATION" file, the result is: 我试图打开“ LC_IDENTIFICATION”文件,结果是:

?Hc cEnglish locale for the USAFree Software Foundation, Inc. http://www.gnu.org/software/libc/bug-glibc-locales@gnu.orgEnglishUSA1.02000-06-24en_US:2000en_US:2000en_US:2000en_US:2000en_US:2000en_US:2000en_US:2000en_US:2000en_US:2000en_US:2000en_US:2000en_US:2000 UTF-8 ?Hc c美国自由软件基金会的英语语言环境http://www.gnu.org/software/libc/bug-glibc-locales@gnu.orgEnglishUSA1.02000 -06-24en_US:2000en_US:2000en_US:2000en_US:2000en_US:2000en_US:2000en_US:2000en_US:2000en_US:2000en_US:2000en_US:2000en_US:2000 UTF-8

At the end of the file there is something that recalls "UTF-8". 在文件末尾,有一些东西可以回想起“ UTF-8”。 Is this enough to assume that the encoding is UTF-8? 这足以假设编码为UTF-8吗?

How to check if a locale is UTF-8? 如何检查语言环境是否为UTF-8?

LC_IDENTIFICATION doesn't tell you much: LC_IDENTIFICATION不会告诉您太多信息:

LC_IDENTIFICATION - this is not a user-visible category, it contains information about the locale itself and is rarely useful for users or developers (but is listed here for completeness sake). LC_IDENTIFICATION-这不是用户可见的类别,它包含有关语言环境本身的信息,很少对用户或开发人员有用(但出于完整性考虑,在此列出)。

You'd have to look at the complete set of files. 您必须查看完整的文件集。

There appears to be no standard command-line utility for doing this, but there is a runtime call (added a little later than the original locale functions). 似乎没有标准的命令行实用程序可以执行此操作,但是有一个运行时调用(添加到原始语言环境功能的后面)。 Here is a sample program which illustrates the function nl_langinfo : 这是一个示例程序,说明了函数nl_langinfo

#include <stdio.h>
#include <locale.h>
#include <langinfo.h>

int
main(int argc, char **argv)
{
    int n;
    for (n = 1; n < argc; ++n) {
        if (setlocale(LC_ALL, argv[n]) != 0) {

            char *code = nl_langinfo(CODESET);
            if (code != 0)
                printf("%s ->%s\n", argv[n], code);
            else
                printf("?%s (nl_langinfo)\n", argv[n]);
        } else {
            printf("? %s (setlocale)\n", argv[n]);
        }
    }
    return 0;
}

and some output, eg, by foo $(locale -a) : 和一些输出,例如,通过foo $(locale -a)

aa_DJ ->ISO-8859-1
aa_DJ.iso88591 ->ISO-8859-1
aa_DJ.utf8 ->UTF-8
aa_ER ->UTF-8
aa_ER@saaho ->UTF-8
aa_ER.utf8 ->UTF-8
aa_ER.utf8@saaho ->UTF-8
aa_ET ->UTF-8
aa_ET.utf8 ->UTF-8
af_ZA ->ISO-8859-1
af_ZA.iso88591 ->ISO-8859-1
af_ZA.utf8 ->UTF-8
am_ET ->UTF-8
am_ET.utf8 ->UTF-8
an_ES ->ISO-8859-15
an_ES.iso885915 ->ISO-8859-15
an_ES.utf8 ->UTF-8
ar_AE ->ISO-8859-6
ar_AE.iso88596 ->ISO-8859-6
ar_AE.utf8 ->UTF-8
ar_BH ->ISO-8859-6
ar_BH.iso88596 ->ISO-8859-6

The directory names you're referring to are often (but not required) to be the same as encoding names . 您引用的目录名称通常(但不是必需)与编码名称相同。 That is the assumption made in the example program. 这是示例程序中所做的假设。 There was a related question in How to get terminal's Character Encoding , but it has no useful answers. 如何获取终端的字符编码中有一个相关的问题,但没有有用的答案。 One is interesting though, since it asserts that 一个有趣的是,因为它断言

locale charmap

will give the locale encoding. 将给出语言环境编码。 According to the standard, that's not necessarily so: 根据标准,不一定如此:

  • The command locale charmap gives the name used in localedef -f 命令locale charmap给出在localedef -f使用的名称

  • However, localedef attaches no special meaning to the name given in the -f option. 但是, localedef-f选项中给出的名称没有特殊含义。

  • localedef has a different option -u which identifies the codeset, but locale (in the standard) mentions no method for displaying this information. localedef有一个不同的选项-u ,它标识代码集,但是locale (在标准中)没有提及显示此信息的方法。

As usual, implementations may (or may not) treat unspecified features in different ways. 与往常一样,实现可以(也可以不)以不同的方式对待未指定的功能。 The GNU C library's documentation differs in some respects from the standard (see locale and localedef ), but offers no explicit options for showing the codeset name. GNU C库的文档在某些方面与标准有所不同(请参阅localelocaledef ),但是没有提供用于显示代码集名称的显式选项。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM