简体   繁体   English

BreakIterator API Java

[英]BreakIterator API Java

The documentation for BreakIterator.getWordInstance() has options to use it with the Locale parameter, presumably because different locales' end results may vary for methods like ( WordInstance , LineInstance , SentenceInstance , CharacterInstance ) BreakIterator.getWordInstance()的文档具有将其与Locale参数一起使用的选项,大概是因为不同的语言环境的最终结果可能会因( WordInstanceLineInstanceSentenceInstanceCharacterInstance )之类的方法而异。

But, when I do not use this parameter, I still get the same results as I get when calling it with any Locale in getAvailableLocales() . 但是,当我不使用此参数时,仍然获得与使用getAvailableLocales()任何语言环境调用它时得到的结果相同的结果。

Is there some pattern, String, or Locale which actually causes these methods to give different results? 是否有某种模式,字符串或语言环境实际上导致这些方法给出不同的结果?

I believe all "western" languages have the same rules. 我相信所有“西方”语言都有相同的规则。

Cursory scan shows that locale th (Thai) has it's own rules, given in file /sun/text/resources/th/WordBreakIteratorData_th inside .../jre/lib/ext/localedata.jar . 粗粗扫描显示的语言环境th (泰国)有它自己的规则,在文件中给出/sun/text/resources/th/WordBreakIteratorData_th.../jre/lib/ext/localedata.jar

It's a binary file, so I don't know what it says, and even if I could understand the file, not knowing Thai, I still wouldn't understand it. 这是一个二进制文件,所以我不知道它说什么,即使我能理解该文件,也不知道泰语,我仍然不会理解。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM