简体   繁体   English

C# 等效于 Java 的 BreakIterator

[英]C# Equivalent for Java's BreakIterator

I'm working on a conversion project from java to c#, is there any c# equivalent for BreakIterator ?我正在处理从 java 到 c# 的转换项目,是否有与BreakIterator等效的 c# ? I was trying IEnumerator , but cannot find iterator.SetText() usage below, can anyone suggest equivalent C# code for below lines:我正在尝试IEnumerator ,但在下面找不到iterator.SetText()用法,谁能建议以下行的等效 C# 代码:

String finalResult=""
ArrayList<String> resultList = new ArrayList<String>();
BreakIterator iterator = BreakIterator.getSentenceInstance(currentLocale);
//int counter = 0;
iterator.setText(finalResult);
int lastIndex = iterator.first();
while (lastIndex != BreakIterator.DONE) 
{
int firstIndex = lastIndex;
lastIndex = iterator.next();
if (lastIndex != BreakIterator.DONE) 
{
    String sentence = finalResult.substring(firstIndex, lastIndex);
    resultList.add(sentence);
    System.out.println("sentence = " + sentence);
    //counter++;
}
}

BreakIterator is a mechanism for supporting locale-aware boundary analysis on arbitrary strings of Unicode text. BreakIterator是一种支持对 Unicode 文本的任意字符串进行区域感知边界分析的机制。 I suspect the Java class is heavily based on (perhaps even directly dependent on, but I'm speculating) the ICU (International Components for Unicode) project: http://site.icu-project.org/我怀疑 Java 类很大程度上基于(甚至可能直接依赖,但我推测)ICU(Unicode 国际组件)项目: http : //site.icu-project.org/

To quote the ICU docs :引用ICU 文档

Text boundary analysis is the process of locating linguistic boundaries while formatting and handling text.文本边界分析是在格式化和处理文本的同时定位语言边界的过程。 Examples of this process include:此过程的示例包括:

  1. Locating appropriate points to word-wrap text to fit within specific margins while displaying or printing.在显示或打印时为自动换行文本定位适当的点以适应特定的边距。
  2. Locating the beginning of a word that the user has selected.定位用户选择的单词的开头。
  3. Counting characters, words, sentences, or paragraphs.计算字符、单词、句子或段落。
  4. Determining how far to move the text cursor when the user hits an arrow key (Some characters require more than one position in the text store and some characters in the text store do not display at all).确定当用户点击箭头键时文本光标移动多远(有些字符需要文本存储中的多个位置,而文本存储中的一些字符根本不显示)。
  5. Making a list of the unique words in a document.制作文档中唯一单词的列表。
  6. Figuring out if a given range of text contains only whole words.确定给定的文本范围是否仅包含整个单词。
  7. Capitalizing the first letter of each word.每个单词的首字母大写。
  8. Locating a particular unit of the text (For example, finding the third word in the document).定位文本的特定单元(例如,查找文档中的第三个单词)。

ICU provides C language bindings, aptly named ICU4C. ICU 提供 C 语言绑定,恰如其分地命名为 ICU4C。 The ICU FAQ describes ICU4C: ICU FAQ描述了 ICU4C:

The C and C++ languages and many operating system environments do not provide full support for Unicode and standards-compliant text handling services. C 和 C++ 语言以及许多操作系统环境不完全支持 Unicode 和符合标准的文本处理服务。 Even though some platforms do provide good Unicode text handling services, portable application code can not make use of them.即使某些平台确实提供了良好的 Unicode 文本处理服务,可移植的应用程序代码也无法使用它们。 The ICU4C libraries fills in this gap. ICU4C 库填补了这一空白。 ICU4C provides an open, flexible, portable foundation for applications to use for their software globalization requirements. ICU4C 为应用程序提供了一个开放、灵活、可移植的基础,以用于其软件全球化需求。 ICU4C closely tracks industry standards, including Unicode and CLDR (Common Locale Data Repository). ICU4C 密切跟踪行业标准,包括 Unicode 和 CLDR(通用语言环境数据存储库)。

SIL International provides C# language bindings, which allow you to use ICU4C in C# applications, via a project named icu-dotnet. SIL International 提供了 C# 语言绑定,允许您通过名为 icu-dotnet 的项目在 C# 应用程序中使用 ICU4C。

You can find the official icu-dotnet repository on Github:你可以在 Github 上找到官方的 icu-dotnet 存储库:
https://github.com/sillsdev/icu-dotnet https://github.com/sillsdev/icu-dotnet

Or, install it via Nuget:或者,通过 Nuget 安装它:
https://www.nuget.org/packages/icu.net/ https://www.nuget.org/packages/icu.net/

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM