如何比較ICU迭代器值？

Question

我正在使用 UCharIterators 在 C 中的 unicode 字符串上編寫 KMP 子字符串搜索算法，我面臨的問題是我需要通過迭代器比較值並且比較應該標准化，而所有 ICU colls 都吸收字符串而不是單個字符。

UCharIterator first_iter, second_iter

uiter_setUTF8( &first_iter, needle_str, n_needle_bytes);
uiter_setUTF8(&second_iter, needle_str, n_needle_bytes);

...
if (firts_iter.current(&first_iter) != second_iter.current(&second_iter)) {
    ...

當前條件在 'a' 和 'ä' 上失敗，而我也不想要它。 我不喜歡預規范化的想法，因為它需要 O(n + m) 額外的內存（據我所知，ICU 沒有就地執行的功能）

Answer 1

對於 UTF-8 ICU 字符串，我不得不切換到 U8_* 宏。 使用U8_NEXT移動偏移量

U8_NEXT((uint8_t *)string, string_offset, string_size, status);

並且這樣比較

U8_GET((uint8_t *)key, 0,  first_key_end, key_size,  first_key_c);
U8_GET((uint8_t *)key, 0, second_key_end, key_size, second_key_c);
if (coll->cmp(key +  first_key_end, U8_LENGTH(first_key_c),
              key + second_key_end, U8_LENGTH(second_key_c),
              coll)

也就是說，通過第一個代碼點（而不是偏移量或字符串的一部分）計算出帶有U8_LENGTH的單個字母的長度。 更多關於這里https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/utf8_8h.html

如何比較ICU迭代器值？

問題描述

1 個解決方案

解決方案1
0 已采納 2022-01-06 14:33:16

如何比較ICU迭代器值？

問題描述

1 個解決方案

解決方案1 0 已采納 2022-01-06 14:33:16

解決方案1
0 已采納 2022-01-06 14:33:16