I am working on a C code and struggling to find a way to terminate a string after specific number of characters. For example I need to terminate string data
after 3 characters. If it's a plain ASCII string, I can do something like
data[3] = 0;
But in my case any of the characters could be multibyte like È or À. What would be the best way to terminate such string after given number of characters ?
UPDATE:
Basically:
char s_mon[7];
setlocale(LC_ALL, "");
strftime(s_mon, 7, "%b", tick_time);
Current locale is French. Month is being stored as "févr.". Need it to be "fév", but this need to be universal so entries like "mars" could also be cut to 3 chars.
Ended up reusing this utf_str_to_upper function. The function correctly deals with multibyte values. It also converts string to uppercase (which I also needed) but that functionality can be stripped if needed.
The only modification I did is pass second parameter limit
which is limit in characters at which I need to cut the string. The function then inserts terminator at correct byte position and returns the byte position. Here's complete code:
uint8_t utf8_str_to_upper(char* s, uint8_t limit) {
uint8_t char_no = 0;
uint8_t* p;
for (p = (uint8_t*)s; *p; ++p) {
// (<128) ascii character
// U+00000000 – U+0000007F: 0xxxxxxx
if (*p < 0b10000000) {
if (*p >= 0x61 && *p <= 0x7A) {
*p = *p - 0x20; // a~z -> A~Z
}
// (<192) unexpected continuation byte
} else if (*p < 0b11000000) {
// (<224) 2 byte sequence
// U+00000080 – U+000007FF: 110xxxxx 10xxxxxx
} else if (*p < 0b11100000) {
uint16_t code = ((uint16_t)(p[0] & 0b00011111) << 6) | (p[1] & 0b00111111);
if (
(code >= 0x00E0 && code <= 0x00F6) || // à~ö -> À~Ö
(code >= 0x00F8 && code <= 0x00FE) // ø~þ -> Ø~Þ
) {
code -= 0x0020;
p[0] = 0b11000000 | ((code >> 6) & 0b00011111);
p[1] = 0b10000000 | ( code & 0b00111111);
}
++p;
// (<240) 3 byte sequence
// U+00000800 – U+0000FFFF: 1110xxxx 10xxxxxx 10xxxxxx
} else if (*p < 0b11110000) {
p += 2;
// (<248) 4 byte sequence
// U+00010000 – U+001FFFFF: 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
} else if (*p < 0b11111000) {
p += 3;
// (<252) 5 byte sequence
// U+00200000 – U+03FFFFFF: 111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
} else if (*p < 0b11111100) {
p += 4;
// (<254) 6 byte sequence
// U+04000000 – U+7FFFFFFF: 1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
} else if (*p < 0b11111110) {
p += 5;
}
if (limit) {
char_no++;
if (char_no == limit) {
*(p + 1) = 0;
return p-(uint8_t*)s + 1;
break;
}
}
}
return p-(uint8_t*)s + 1;
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.