等价于c ++中的Java的String.getBytes（“ UTF-8”）？

Question

I need to implement this Java code in (unmanaged) c++: 我需要在（非托管）c ++中实现此Java代码：

byte[] b = string.getBytes("UTF8");

I'm new to c++, and can't find anything to do this. 我是C ++的新手，找不到任何可以做到这一点的东西。 It has to be platform independent, if possible. 如果可能，它必须独立于平台。 Using c++11 compiler. 使用c ++ 11编译器。

Answer 1

Java String is roughly equivalent to std::u16string , a specialization of std::basic_string . Java String大致等效于std::u16string ，这是std::basic_string 。 I suggest you try something like... 我建议您尝试类似...

std::wstring_convert<std::codecvt_utf8_utf16<char16_t>, char16_t> convert;
std::string converted = convert.to_bytes(u"HELLO, WORLD!");
const char *bytes = converted.data();

Note this relies on C++11; 注意，这依赖于C ++ 11。 it might be sometime before your compiler vendor fully supports these features. 您的编译器供应商可能需要一段时间才能完全支持这些功能。

Here, we utilize the newly introduced std::wstring_convert to convert from a wide-character UTF-16 string to the UTF-8 multibyte string via to_bytes (it also supports conversion in the other direction, too). 在这里，我们利用新引入std::wstring_convert从宽字符UTF-16串经由UTF-8字节字符串转换to_bytes （它也支持在其他方向转换，太）。

This is made possible via the (also newly introduced) std::codecvt_utf8_utf16 conversion facet. 这可以通过（也是新引入的） std::codecvt_utf8_utf16转换方面来实现。 It takes care of the actual conversion for us nicely. 它很好地照顾了我们的实际转换。

Besides that, it makes use of the new character literal prefixes added with C++11 -- in particular, u , which is for char16_t UTF-16 strings :-) There are also u8 and U for UTF-8 and UTF-32, respectively. 除此之外，它还利用了C ++ 11中添加的新字符文字前缀-特别是u ，它用于char16_t UTF-16字符串:-) UTF-8和UTF-32也有u8和U ，分别。

PS data is (as of C++11) guaranteed to be equal to c_str and therefore can be relied upon to be NUL-terminated. PS data （从C ++ 11开始）保证等于c_str ，因此可以依靠NUL终止。

Answer 2

Solution Number 1:- 解决方案编号1：-

 char bytecpp[]= u8"You don't need strings.getbytes :P";

Solution Number 2:- 解决方案编号2：-

std::wstring_convert<std::codecvt_utf8_utf16<char16_t>> myconv;
std::string mbstring = myconv.to_bytes(u"Hello\n");
std::cout << mbstring;

Answer 3

Assuming the string is already in UTF-8, you can use: 假设字符串已经在UTF-8中，则可以使用：

char const *c = myString.c_str();

For read/write access, you could use: 对于读/写访问，您可以使用：

std::vector<char> bytes(myString.begin(), myString.end());
bytes.push_back('\0');
char *c = &bytes[0];

Answer 4

A string in C++ is typically ASCII 1 byte per character. C ++中的字符串通常是每个字符ASCII 1个字节。 So you would have to take care of it before you marshaled it to C++ if you went with the typical std::string. 因此，如果使用典型的std :: string，则必须先处理它，然后再将其编组为C ++。 However C++ does define a wide character string std::wstring, unfortunately(from the wikipedia article on wide characters): 但是不幸的是，C ++确实定义了宽字符串std :: wstring（摘自Wikipedia文章中的宽字符）：

The width of wchar_t is compiler-specific and can be as small as 8 bits. wchar_t的宽度是编译器特定的，并且可以小到8位。 Consequently, programs that need to be portable across any C or C++ compiler should not use wchar_t for storing Unicode text. 因此，需要跨任何C或C ++编译器移植的程序不应使用wchar_t来存储Unicode文本。 The wchar_t type is intended for storing compiler-defined wide characters, which may be Unicode characters in some compilers. wchar_t类型用于存储编译器定义的宽字符，在某些编译器中可能是Unicode字符。

So we would have to know what C++ compiler you were going to use to answer the question completely. 因此，我们必须知道您将使用什么C ++编译器来完全回答问题。 For the std::wstring class there is no to bytes type function, so what you want to do is use c_str() as mentioned in the other answers then use &(bit wise and) and a byte mask to split the wide characters in to bytes. 对于std :: wstring类，没有字节类型的函数，因此您要执行的操作是使用其他答案中提到的c_str（），然后使用＆（bit wise and）和字节掩码将中的宽字符分开到字节。

in visual C++ a wide character is 16 bits so you would want something like the following to process each characters in to bytes: 在Visual C ++中，宽字符为16位，因此您需要执行以下操作将每个字符转换为字节：

high_byte = wcharacter & 0xFF00;
low_byte = wcharacter & 0xFF;

等价于c ++中的Java的String.getBytes（“ UTF-8”）？

问题描述

4 个解决方案

解决方案1
3 已采纳 2012-09-01 19:53:47

解决方案2
1 2012-09-01 20:12:11

解决方案3
0 2012-09-01 18:55:35

解决方案4
0 2012-09-01 19:13:35

等价于c ++中的Java的String.getBytes（“ UTF-8”）？

问题描述

4 个解决方案

解决方案1 3 已采纳 2012-09-01 19:53:47

解决方案2 1 2012-09-01 20:12:11

解决方案3 0 2012-09-01 18:55:35

解决方案4 0 2012-09-01 19:13:35

解决方案1
3 已采纳 2012-09-01 19:53:47

解决方案2
1 2012-09-01 20:12:11

解决方案3
0 2012-09-01 18:55:35

解决方案4
0 2012-09-01 19:13:35