简体   繁体   English

用完整的特殊字符或空格将字符串数组中的句子拆分

[英]Split a sentence in array of string with special characters or spaces intact

I want to split a sentence having spaces or any special character into an array of words with spaces or special character also an element of array. 我想将具有空格或任何特殊字符的句子拆分成具有空格或特殊字符也是数组元素的单词数组。

Sentence like: 句子如下:

aman,amit and sumit went to top-up 

should be split into an array of String: 应拆分为String数组:

{"aman",",","amit"," ","and"," ","sumit"," ","went"," ","to"," ","top","-","up")

Please suggest any regex or logic to split the same using java. 请建议使用Java将任何正则表达式或逻辑拆分为正则表达式。

I missed one thing in my question. 我错过了我的问题的一件事。 I also need to split on numeric character as well.. But using split("\\b") does not split a string having something like 我还需要对数字字符进行拆分。但是使用split(“ \\ b”)不会拆分具有类似内容的字符串

abc12def abc12def

into 进入

{ "abc", "12","def") or {"abc","1","2","def")

It seems all you need is to match either word characters ( \\w+ ) or non-word ones ( \\W+ ). 似乎您所需要的只是匹配单词字符( \\w+ )或非单词字符( \\W+ )。 Combine these with an alternation operator and - perhaps - add a Pattern.UNICODE_CHARACTER_CLASS (or its inline/embedded version (?U) ) to make the pattern Unicode-aware: 将它们与替代运算符结合在一起,并可能添加一个Pattern.UNICODE_CHARACTER_CLASS (或其内联/嵌入式版本(?U) ),以使模式可识别Unicode:

String value = "aman,amit and sumit went to top-up";
String pattern = "(?U)\\w+|\\W+";
List<String> lst = new ArrayList<>();
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(value);
while (m.find())
    lst.add(m.group(0));
System.out.println(lst);

See the Java demo 参见Java演示

I hope the below code snippet helps you solve this. 希望以下代码段可以帮助您解决此问题。

public static void main(final String[] args) {
        String message = "aman,amit and sumit went to top-up";
        String[] messages = message.split("\\b");
        for(String string : messages) {
            System.out.println(string);
        }
    }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM