处理utf8编码的char *数组

Question

A file contains non-latin content and is encoded in UTF8. 文件包含非拉丁内容，并以UTF8编码。 Currently the existing code uses " fopen " to open the file, parses it and calls my validate function with the non-latin content and passes data as char* . 当前，现有代码使用“ fopen ”打开文件，对其进行解析并使用非拉丁内容调用我的validate函数，并将数据作为char*传递。

void validate(const char* str)
{
    ....
}

I have to do some validation on passed char array. 我必须对传递的char数组进行一些验证。

The application uses Sun C++ 5.11 and which I think doesn't supports unicode . 该应用程序使用Sun C++ 5.11 ，但我认为它不支持unicode 。 (I googled for unicode support on Sun C++ 5.11, I didn't get any proper pointers about the unicode support. So I wrote a simple program to check if Sun C++ supports unicode and the program didn't compile). （我在Sun C ++ 5.11上搜索了unicode支持，但没有获得有关unicode支持的任何正确的指针。因此，我编写了一个简单的程序来检查Sun C ++是否支持unicode并且该程序未编译）。

How do I do the validation on the input char* ? 如何对输入的char*进行验证？ Is it possible using wchar_t ? 可以使用wchar_t吗？

Answer 1

The application uses <compiler> and which I think doesn't supports unicode 该应用程序使用<compiler> ，我认为它不支持unicode

This isn't a problem. 这不是问题。 You only need compiler support for unicode to embed unicode string literals in the code, or for fixed width character types to represent UTF-16 or UTF-32. 您只需要编译器支持unicode，即可将unicode字符串文字嵌入代码中，或者只支持固定宽度的字符类型来表示UTF-16或UTF-32。 Your unicode is UTF-8 and comes from user input, so no unicode compiler support should be needed. 您的Unicode是UTF-8，来自用户输入，因此不需要Unicode编译器支持。

How do I do the validation on the input char* ? 如何对输入的char*进行验证？

The C++ standard library has very few tools for processing unicode. C ++标准库几乎没有用于处理unicode的工具。 The provided tools primarily consist of conversion between different unicode formats, and even those tools were not available prior to C++11. 提供的工具主要包括不同unicode格式之间的转换，甚至那些工具在C ++ 11之前都不可用。

Input and output is mostly just copying of bytes, so no significant processing is required to do that. 输入和输出大部分只是字节的复制，因此不需要大量处理。 For other processing (which you presumably need for "validation") you will need to implement the tools yourself, or use third party tools. 对于其他处理（您可能需要“验证”），您将需要自己实施工具或使用第三方工具。 You will need to refer to the ~1000 pages of the unicode standard if you choose to implement yourself: http://www.unicode.org/versions/Unicode9.0.0/UnicodeStandard-9.0.pdf 如果您选择实现自己，则需要参考unicode标准的〜1000页： http : //www.unicode.org/versions/Unicode9.0.0/UnicodeStandard-9.0.pdf

Is it possible using wchar_t ? 可以使用wchar_t吗？

wchar_t is the native wide character type used for the native wide character encoding of the system. wchar_t是用于系统的本机宽字符编码的本机宽字符类型。 UTF-8 does not use wide code-units. UTF-8不使用宽代码单元。

处理utf8编码的char *数组

问题描述

1 个解决方案

解决方案1
1 2017-02-14 11:04:58

处理utf8编码的char *数组

问题描述

1 个解决方案

解决方案1 1 2017-02-14 11:04:58

解决方案1
1 2017-02-14 11:04:58