[英]Equivalent of Java's String.getBytes(“UTF-8”) in c++?
I need to implement this Java code in (unmanaged) c++: 我需要在(非托管)c ++中实现此Java代码:
byte[] b = string.getBytes("UTF8");
I'm new to c++, and can't find anything to do this. 我是C ++的新手,找不到任何可以做到这一点的东西。 It has to be platform independent, if possible. 如果可能,它必须独立于平台。 Using c++11 compiler. 使用c ++ 11编译器。
Java String
is roughly equivalent to std::u16string
, a specialization of std::basic_string
. Java String
大致等效于std::u16string
,这是std::basic_string
。 I suggest you try something like... 我建议您尝试类似...
std::wstring_convert<std::codecvt_utf8_utf16<char16_t>, char16_t> convert;
std::string converted = convert.to_bytes(u"HELLO, WORLD!");
const char *bytes = converted.data();
Note this relies on C++11; 注意,这依赖于C ++ 11。 it might be sometime before your compiler vendor fully supports these features. 您的编译器供应商可能需要一段时间才能完全支持这些功能。
Here, we utilize the newly introduced std::wstring_convert
to convert from a wide-character UTF-16 string to the UTF-8 multibyte string via to_bytes
(it also supports conversion in the other direction, too). 在这里,我们利用新引入std::wstring_convert
从宽字符UTF-16串经由UTF-8字节字符串转换to_bytes
(它也支持在其他方向转换,太)。
This is made possible via the (also newly introduced) std::codecvt_utf8_utf16
conversion facet. 这可以通过(也是新引入的) std::codecvt_utf8_utf16
转换方面来实现。 It takes care of the actual conversion for us nicely. 它很好地照顾了我们的实际转换。
Besides that, it makes use of the new character literal prefixes added with C++11 -- in particular, u
, which is for char16_t
UTF-16 strings :-) There are also u8
and U
for UTF-8 and UTF-32, respectively. 除此之外,它还利用了C ++ 11中添加的新字符文字前缀-特别是u
,它用于char16_t
UTF-16字符串:-) UTF-8和UTF-32也有u8
和U
, 分别。
PS data
is (as of C++11) guaranteed to be equal to c_str
and therefore can be relied upon to be NUL-terminated. PS data
(从C ++ 11开始)保证等于c_str
,因此可以依靠NUL终止。
Solution Number 1:- 解决方案编号1:-
char bytecpp[]= u8"You don't need strings.getbytes :P";
Solution Number 2:- 解决方案编号2:-
std::wstring_convert<std::codecvt_utf8_utf16<char16_t>> myconv;
std::string mbstring = myconv.to_bytes(u"Hello\n");
std::cout << mbstring;
Assuming the string is already in UTF-8, you can use: 假设字符串已经在UTF-8中,则可以使用:
char const *c = myString.c_str();
For read/write access, you could use: 对于读/写访问,您可以使用:
std::vector<char> bytes(myString.begin(), myString.end());
bytes.push_back('\0');
char *c = &bytes[0];
A string in C++ is typically ASCII 1 byte per character. C ++中的字符串通常是每个字符ASCII 1个字节。 So you would have to take care of it before you marshaled it to C++ if you went with the typical std::string. 因此,如果使用典型的std :: string,则必须先处理它,然后再将其编组为C ++。 However C++ does define a wide character string std::wstring, unfortunately(from the wikipedia article on wide characters): 但是不幸的是,C ++确实定义了宽字符串std :: wstring(摘自Wikipedia文章中的宽字符):
The width of wchar_t is compiler-specific and can be as small as 8 bits. wchar_t的宽度是编译器特定的,并且可以小到8位。 Consequently, programs that need to be portable across any C or C++ compiler should not use wchar_t for storing Unicode text. 因此,需要跨任何C或C ++编译器移植的程序不应使用wchar_t来存储Unicode文本。 The wchar_t type is intended for storing compiler-defined wide characters, which may be Unicode characters in some compilers. wchar_t类型用于存储编译器定义的宽字符,在某些编译器中可能是Unicode字符。
So we would have to know what C++ compiler you were going to use to answer the question completely. 因此,我们必须知道您将使用什么C ++编译器来完全回答问题。 For the std::wstring class there is no to bytes type function, so what you want to do is use c_str() as mentioned in the other answers then use &(bit wise and) and a byte mask to split the wide characters in to bytes. 对于std :: wstring类,没有字节类型的函数,因此您要执行的操作是使用其他答案中提到的c_str(),然后使用&(bit wise and)和字节掩码将中的宽字符分开到字节。
in visual C++ a wide character is 16 bits so you would want something like the following to process each characters in to bytes: 在Visual C ++中,宽字符为16位,因此您需要执行以下操作将每个字符转换为字节:
high_byte = wcharacter & 0xFF00;
low_byte = wcharacter & 0xFF;
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.