简体   繁体   English

Android中的BreakIterator错误计数字符

[英]BreakIterator in Android counts character wrongly

I am using BreakIterator to count the number of visible character in a String. 我正在使用BreakIterator来计算字符串中可见字符的数量。 This works perfectly for English language. 这非常适合英语。 But in case of Hindi language it doesn't work as expected. 但是,如果使用印地语,则无法正常工作。

The below String has a length of 3, but is considered as single character visually. 下面的字符串的长度为3,但在视觉上被视为单个字符。

ज्य

When I used BreakIterator , I expect it to consider it as a single unit, but it considers it as 2 units. 当我使用BreakIterator ,我希望它会将其视为一个单元,但会将其视为2个单元。 The below is my code: 下面是我的代码:

    final String text = "ज्य";
    final Locale locale = new Locale("hi","IN");
    final BreakIterator breaker = BreakIterator.getCharacterInstance(locale);
    breaker.setText(text);
    int start = breaker.first();
    for (int end = breaker.next();
         end != BreakIterator.DONE;
         start = end, end = breaker.next()) {

        final String substring = text.substring(start, end);
    }

Ideally, the for loop should be executed ONCE with start=0 and end=3; 理想情况下,应在start = 0和end = 3的情况下一次执行for循环。 But for the String above it's executed twice (start=0, end=2 and start=2, end=3). 但是对于上面的字符串,它执行了两次(start = 0,end = 2和start = 2,end = 3)。

How can I get BreakIterator to work exactly? 我怎样才能让BreakIterator正常工作?

UPDATE: 更新:

The above piece of code works perfectly when run as a JAVA program. 上面的代码作为JAVA程序运行时,效果很好。 It misbehaves only when used in ANDROID. 仅当在ANDROID中使用时,它才会出现异常。

Since this happens only in Android, I have reported a bug in android: https://code.google.com/p/android/issues/detail?id=230832 由于这种情况仅发生在Android中,因此我报告了一个Android中的错误: https//code.google.com/p/android/issues/detail? id = 230832

I think you need to play with unicode characters 我认为您需要使用unicode字符

Oracle Doc. Oracle文档 for Character Boundaries 用于角色边界

    final String text = "\u091C\u094D\u092F";
    final Locale locale = new Locale("hi","IN");
    final BreakIterator breaker = BreakIterator.getCharacterInstance(locale);
    breaker.setText(text);
    int start = breaker.first();
    for (int end = breaker.next();
         end != BreakIterator.DONE;
         start = end, end = breaker.next()) {

        final String substring = text.substring(start, end);
        System.out.println(substring);
    }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM