简体   繁体   English

等价于c ++中的Java的String.getBytes(“ UTF-8”)?

[英]Equivalent of Java's String.getBytes(“UTF-8”) in c++?

I need to implement this Java code in (unmanaged) c++: 我需要在(非托管)c ++中实现此Java代码:

byte[] b = string.getBytes("UTF8");

I'm new to c++, and can't find anything to do this. 我是C ++的新手,找不到任何可以做到这一点的东西。 It has to be platform independent, if possible. 如果可能,它必须独立于平台。 Using c++11 compiler. 使用c ++ 11编译器。

Java String is roughly equivalent to std::u16string , a specialization of std::basic_string . Java String大致等效于std::u16string ,这是std::basic_string I suggest you try something like... 我建议您尝试类似...

std::wstring_convert<std::codecvt_utf8_utf16<char16_t>, char16_t> convert;
std::string converted = convert.to_bytes(u"HELLO, WORLD!");
const char *bytes = converted.data();

Note this relies on C++11; 注意,这依赖于C ++ 11。 it might be sometime before your compiler vendor fully supports these features. 您的编译器供应商可能需要一段时间才能完全支持这些功能。

Here, we utilize the newly introduced std::wstring_convert to convert from a wide-character UTF-16 string to the UTF-8 multibyte string via to_bytes (it also supports conversion in the other direction, too). 在这里,我们利用新引入std::wstring_convert从宽字符UTF-16串经由UTF-8字节字符串转换to_bytes (它也支持在其他方向转换,太)。

This is made possible via the (also newly introduced) std::codecvt_utf8_utf16 conversion facet. 这可以通过(也是新引入的) std::codecvt_utf8_utf16转换方面来实现。 It takes care of the actual conversion for us nicely. 它很好地照顾了我们的实际转换。

Besides that, it makes use of the new character literal prefixes added with C++11 -- in particular, u , which is for char16_t UTF-16 strings :-) There are also u8 and U for UTF-8 and UTF-32, respectively. 除此之外,它还利用了C ++ 11中添加的新字符文字前缀-特别是u ,它用于char16_t UTF-16字符串:-) UTF-8和UTF-32也有u8U , 分别。


PS data is (as of C++11) guaranteed to be equal to c_str and therefore can be relied upon to be NUL-terminated. PS data (从C ++ 11开始)保证等于c_str ,因此可以依靠NUL终止。

Solution Number 1:- 解决方案编号1:-

 char bytecpp[]= u8"You don't need strings.getbytes :P";

Solution Number 2:- 解决方案编号2:-

std::wstring_convert<std::codecvt_utf8_utf16<char16_t>> myconv;
std::string mbstring = myconv.to_bytes(u"Hello\n");
std::cout << mbstring;

Assuming the string is already in UTF-8, you can use: 假设字符串已经在UTF-8中,则可以使用:

char const *c = myString.c_str();

For read/write access, you could use: 对于读/写访问,您可以使用:

std::vector<char> bytes(myString.begin(), myString.end());
bytes.push_back('\0');
char *c = &bytes[0];

A string in C++ is typically ASCII 1 byte per character. C ++中的字符串通常是每个字符ASCII 1个字节。 So you would have to take care of it before you marshaled it to C++ if you went with the typical std::string. 因此,如果使用典型的std :: string,则必须先处理它,然后再将其编组为C ++。 However C++ does define a wide character string std::wstring, unfortunately(from the wikipedia article on wide characters): 但是不幸的是,C ++确实定义了宽字符串std :: wstring(摘自Wikipedia文章中的宽字符):

The width of wchar_t is compiler-specific and can be as small as 8 bits. wchar_t的宽度是编译器特定的,并且可以小到8位。 Consequently, programs that need to be portable across any C or C++ compiler should not use wchar_t for storing Unicode text. 因此,需要跨任何C或C ++编译器移植的程序不应使用wchar_t来存储Unicode文本。 The wchar_t type is intended for storing compiler-defined wide characters, which may be Unicode characters in some compilers. wchar_t类型用于存储编译器定义的宽字符,在某些编译器中可能是Unicode字符。

So we would have to know what C++ compiler you were going to use to answer the question completely. 因此,我们必须知道您将使用什么C ++编译器来完全回答问题。 For the std::wstring class there is no to bytes type function, so what you want to do is use c_str() as mentioned in the other answers then use &(bit wise and) and a byte mask to split the wide characters in to bytes. 对于std :: wstring类,没有字节类型的函数,因此您要执行的操作是使用其他答案中提到的c_str(),然后使用&(bit wise and)和字节掩码将中的宽字符分开到字节。

in visual C++ a wide character is 16 bits so you would want something like the following to process each characters in to bytes: 在Visual C ++中,宽字符为16位,因此您需要执行以下操作将每个字符转换为字节:

high_byte = wcharacter & 0xFF00;
low_byte = wcharacter & 0xFF;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM