简体   繁体   English

如何在C#中的Unicode流中确定单词边界?

[英]How do I determine a word boundary in Unicode stream in C#?

I'm reading a Unicode stream and would rather not have to pass the entire string through a regex. 我正在读取Unicode流,而不希望不必通过正则表达式传递整个字符串。 Is there a simple (reliable) character I can use to break words across languages? 我可以使用一个简单(可靠)的字符来打断各种语言中的单词吗?

My byte array is likely going to be based in UTF-16 or UTF-8 我的字节数组可能会基于UTF-16或UTF-8

如果使用Java,则可以使用BreakIterator

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM