The following things I have realized:
- Unicode character can be represented as up to 4 bytes sequence. So, if a character is represented in two or more bytes - byte ordering is important regarding to BEM or LEM
- UTF-8 write bytes into file/network stream byte by byte (not multibytes writing or reading) that means if a character is represented in two or more bytes, while encoding it writes one byte at time. Then it does not matter BEM or LEM while decoding it always reads back bytes correctly and does not swap them when writing or reading.
- UTF-16 or UTF-32 use always two or four bytes while encoding, so LEM or BEM now really matter because of multibytes reading/writing.
- In addition, I understand how UTF-8 knows to interpret bytes as a character while reading from a file (decoding).
So. here is the example:
I declared and initialized String
variable as "ANФГ"
in C++.
Questions.
char
is a one byte character data type. String
class is based on char[]
in C++ ? EDIT_1: I dont understand one thing. If I have three bytes: - 1000 1111 - 1100 0000 - 0100 0000 The first one and the second one represent one character in UTF-8, the third one represents one as well. The order of bytes is I wrote above. Every byte has his own address, right? But when multibytes writing happen two bytes are stored at one place? I mean, any output stream writes data in order left-to-right? Then it will be read back left-to-right as well? Because LEM or BEM swap bytes.. but when it is multibytes writing. But when we write only one byte at time it has his own correct order left-to-right?
std::string
(or rather, std::basic_string<char>
) uses char
to store its data. It is encoding-agnostic, so if you for instance call size()
you will get the actual number of char
s representing the string, not the number of characters or code points.u8
prefix to get UTF-8 string literals (eg u8"ANФГ"
).std::string
will contain UTF-8, and UTF-8 will be written to file if you're using eg operator<<()
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.