简体   繁体   English

将 unicode 字符串拆分为字符串列表

[英]Split unicode string into list of character strings

How to split unicode string containing surrogate-pair characters and normal characters into a List<String> of characters?如何将包含代理对字符和普通字符的 unicode 字符串拆分为List<String>字符?

( String is required to store surrogate-pair characters consisting of two char ) (需要String来存储由两个char组成的代理对字符)

Try this.尝试这个。

String s = "😊a👦c😊";
List<String> result = List.of(s.split("(?<=.)"));
for (String e : result)
    System.out.println(e + " : length=" + e.length());

output: output:

😊 : length=2
a : length=1
👦 : length=2
c : length=1
😊 : length=2

Code points代码点

Or, use a stream of code point integer numbers.或者,使用代码点integer 编号的 stream。

List<String> result = 
    s
    .codePoints()                    // Produce a `IntStream` of code point numbers.
    .mapToObj(Character::toString)   // Produce a `String` containing one or two java chars for each code point in the stream.
    .collect(Collectors.toList());

See this code run live at IdeOne.com .请参阅在 IdeOne.com 上实时运行的代码

To capture the code points, use this variation of the above code.要捕获代码点,请使用上述代码的这种变体。

List<Integer> codePointNumbers = 
    s
    .codePoints()            
    .boxed()       
    .collect( Collectors.toList() ) ;

When run:运行时:

codePointNumbers.toString(): [128522, 97, 128102, 99, 128522] codePointNumbers.toString(): [128522, 97, 128102, 99, 128522]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM