简体   繁体   中英

Split string by whitespaces removes new line characters

I'm splitting a string by whitespaces, but for some reason the new line characters are being removed. For example:

String[] splitSentence = "Example sentence\n\n This sentence is an example".
   split("\\s+");

splitSentence will contain this:

["Example", "sentence", "This", "sentence", "is", "an", "example"]

and if I make this:

String[] splitSentence = "Example sentence\n\n This sentence is an example".
   split("\\s");

splitSentence will contain this:

["Example", "sentence", "", "", "This", "sentence", "is", "an", "example"]

I'm trying to achieve something like this:

["Example", "sentence\n\n", "This", "sentence", "is", "an", "example"]  

Or like this:

["Example", "sentence", "\n", "\n", "This", "sentence", "is", "an", "example"]

I've tried a lot of things with no luck... Any help will be appreciated.

String[] splitSentence = "Example sentence\n\n This sentence is an example".
   split(' ');

this version should work, so empty space will be remove only and not new line.

Split by spaces and tabs (without newline):

String[] splitSentence = "Example sentence\n\n This sentence is an example".split("[ \t]+");

Result: ["Example", "sentence\\n\\n", "This", "sentence", "is", "an", "example"]

In a regex, \\s is defined to be equivalent to the characters in this set:

[ \t\n\x0B\f\r]

(See the javadoc ). If you don't want newlines to be treated like spaces, then you can write your own set:

splitSentence = "Example sentence\n\n This sentence is an example".split("[ \t\\x0B\f\r]+");

(or eliminate other characters you don't want the split to recognize).

( \\t is TAB, \\x0B is vertical tab, \\f is FF (form feed), \\r is CR)

This method seems to produce the second result you mentioned, where the \\n 's are returned as separate strings: 这个方法似乎产生了你提到的第二个结果,其中\\n是作为单独的字符串返回的:

splitSentence = "Example sentence\n\n This sentence is an example".split("[ \t\\x0B\f\r]+|(?=\n)");

This uses lookahead to split at a point that is immediately followed by \\n , but doesn't treat \\n as a delimiter that will be removed from the result.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM