[英]BreakIterator in Android counts character wrongly
I am using BreakIterator
to count the number of visible character in a String. 我正在使用BreakIterator
来计算字符串中可见字符的数量。 This works perfectly for English language. 这非常适合英语。 But in case of Hindi language it doesn't work as expected. 但是,如果使用印地语,则无法正常工作。
The below String has a length of 3, but is considered as single character visually. 下面的字符串的长度为3,但在视觉上被视为单个字符。
ज्य
When I used BreakIterator
, I expect it to consider it as a single unit, but it considers it as 2 units. 当我使用BreakIterator
,我希望它会将其视为一个单元,但会将其视为2个单元。 The below is my code: 下面是我的代码:
final String text = "ज्य";
final Locale locale = new Locale("hi","IN");
final BreakIterator breaker = BreakIterator.getCharacterInstance(locale);
breaker.setText(text);
int start = breaker.first();
for (int end = breaker.next();
end != BreakIterator.DONE;
start = end, end = breaker.next()) {
final String substring = text.substring(start, end);
}
Ideally, the for
loop should be executed ONCE with start=0 and end=3; 理想情况下,应在start = 0和end = 3的情况下一次执行for
循环。 But for the String above it's executed twice (start=0, end=2 and start=2, end=3). 但是对于上面的字符串,它执行了两次(start = 0,end = 2和start = 2,end = 3)。
How can I get BreakIterator
to work exactly? 我怎样才能让BreakIterator
正常工作?
UPDATE: 更新:
The above piece of code works perfectly when run as a JAVA program. 上面的代码作为JAVA程序运行时,效果很好。 It misbehaves only when used in ANDROID. 仅当在ANDROID中使用时,它才会出现异常。
Since this happens only in Android, I have reported a bug in android: https://code.google.com/p/android/issues/detail?id=230832 由于这种情况仅发生在Android中,因此我报告了一个Android中的错误: https : //code.google.com/p/android/issues/detail? id = 230832
I think you need to play with unicode characters 我认为您需要使用unicode字符
Oracle Doc. Oracle文档 for Character Boundaries 用于角色边界
final String text = "\u091C\u094D\u092F";
final Locale locale = new Locale("hi","IN");
final BreakIterator breaker = BreakIterator.getCharacterInstance(locale);
breaker.setText(text);
int start = breaker.first();
for (int end = breaker.next();
end != BreakIterator.DONE;
start = end, end = breaker.next()) {
final String substring = text.substring(start, end);
System.out.println(substring);
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.