简体   繁体   English

按空格分割字符串会删除换行符

[英]Split string by whitespaces removes new line characters

I'm splitting a string by whitespaces, but for some reason the new line characters are being removed. 我正在用空格分割字符串,但由于某种原因,新行字符被删除。 For example: 例如:

String[] splitSentence = "Example sentence\n\n This sentence is an example".
   split("\\s+");

splitSentence will contain this: splitSentence将包含:

["Example", "sentence", "This", "sentence", "is", "an", "example"]

and if I make this: 如果我这样做:

String[] splitSentence = "Example sentence\n\n This sentence is an example".
   split("\\s");

splitSentence will contain this: splitSentence将包含:

["Example", "sentence", "", "", "This", "sentence", "is", "an", "example"]

I'm trying to achieve something like this: 我正在努力实现这样的目标:

["Example", "sentence\n\n", "This", "sentence", "is", "an", "example"]  

Or like this: 或者像这样:

["Example", "sentence", "\n", "\n", "This", "sentence", "is", "an", "example"]

I've tried a lot of things with no luck... Any help will be appreciated. 我已经尝试了很多没有运气的事情......任何帮助都将受到赞赏。

String[] splitSentence = "Example sentence\n\n This sentence is an example".
   split(' ');

this version should work, so empty space will be remove only and not new line. 这个版本应该工作,所以空白空间将只删除而不是新行。

Split by spaces and tabs (without newline): 按空格和制表符分割(不带换行符):

String[] splitSentence = "Example sentence\n\n This sentence is an example".split("[ \t]+");

Result: ["Example", "sentence\\n\\n", "This", "sentence", "is", "an", "example"] 结果: ["Example", "sentence\\n\\n", "This", "sentence", "is", "an", "example"]

In a regex, \\s is defined to be equivalent to the characters in this set: 在正则表达式中, \\s被定义为等同于此集合中的字符:

[ \t\n\x0B\f\r]

(See the javadoc ). (见javadoc )。 If you don't want newlines to be treated like spaces, then you can write your own set: 如果您不希望换行符被视为空格,那么您可以编写自己的集合:

splitSentence = "Example sentence\n\n This sentence is an example".split("[ \t\\x0B\f\r]+");

(or eliminate other characters you don't want the split to recognize). (或消除您不希望split识别的其他字符)。

( \\t is TAB, \\x0B is vertical tab, \\f is FF (form feed), \\r is CR) \\t是TAB, \\x0B是垂直制表符, \\f是FF(换页), \\r是CR)

EDIT: This method seems to produce the second result you mentioned, where the \\n 's are returned as separate strings: 编辑:这个方法似乎产生了你提到的第二个结果,其中\\n是作为单独的字符串返回的:

splitSentence = "Example sentence\n\n This sentence is an example".split("[ \t\\x0B\f\r]+|(?=\n)");

This uses lookahead to split at a point that is immediately followed by \\n , but doesn't treat \\n as a delimiter that will be removed from the result. 这使用前瞻分割在紧跟着\\n的点之后,但不将\\n视为将从结果中删除的分隔符。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM