Java字符串-在空间上拆分，但保留双倍空格

Question

Currently I am splitting a string by spaces. 目前，我正在按空格分割字符串。 However there are some double spaces that I want to preserve when I put them all back together. 但是，当我将它们放回原处时，我想保留一些双空格。 Any suggestions on how to do this? 有关如何做到这一点的任何建议？

Ie the string "I went to the beach. I ate pie" is getting split as 即字符串"I went to the beach. I ate pie"被拆分为

I
went
to
the
beach.

I
ate
pie

I don't want the blank entries but I want to put it back together to the same format. 我不需要空白条目，但我想将其放回相同的格式。 Thanks all! 谢谢大家！

Answer 1

Do a String replaceAll(" ", " unlikelyCharacterSequence") and then split your string by spaces as normal. 做一个字符串replaceAll（“”，“可能性不大的字符序列”），然后像平常一样用空格分割您的字符串。 Then you can convert back to a double space by replacing your {unlikelyCharacterSequence} with " " at the end. 然后，您可以通过将{unlikelyCharacterSequence}末尾替换为“”来转换回双倍空格。

However: this will fail if you ever encounter your "unlikely" character sequence in your actual, unmodified String. 但是：如果您在未修改的实际String中遇到“不太可能”的字符序列，这将失败。 For a more general purpose solution, check the alternative listed below this example. 对于更通用的解决方案，请检查此示例下面列出的替代方法。

Example (warning, depends on non-existance of !@#!@# : 示例（警告，取决于！@＃！@＃的不存在：

String example = "Hello.  That was a double space. That was a single space."
String formatted = example.replace("  ", " !@#!@#");
String [] split = formatted.split(" ");
for(int i = 0; i < split.length; i++)
{
  split.replace("!@#!@#", " ");
}
// Recombine your splits?

Alternatively you could take a more robust strategy of recombining the string as you have it in your question but ignoring elements containing only a single space: 或者，您可以采用更健壮的策略来重新组合字符串（如您在问题中所看到的那样），但忽略仅包含单个空格的元素：

String example = "ThisShouldBeTwoElements.  ButItIsNot.";
String [] splitString = example.split(" ");
String recombined = "";
for(int i = 0; i < splitString.length; i++)
{
  if(!splitString[i].equals(" "))
    recombined += splitString[i];
}

Answer 2

String st = "I went to the beach.  I ate pie";
st.split("\\s{1}(?!\\s)");

This results in 这导致了

[I, went, to, the, beach. , I, ate, pie]

I also suggest looking at http://docs.oracle.com/javase/6/docs/api/ and/or http://www.regular-expressions.info/java.html so you understand what this is doing. 我还建议您查看http://docs.oracle.com/javase/6/docs/api/和/或http://www.regular-expressions.info/java.html，以便您了解这样做的意思。

Answer 3

Take a good look at what Java's Regex can do for you. 仔细看看Java的Regex能为您做什么。 There's a way to recongnize pattern using regex. 有一种使用正则表达式来确认模式的方法。

Java regex examples Java正则表达式示例

Answer 4

Try this, it should remove all white spaces that are between non white space characters. 尝试此操作，它应删除非空白字符之间的所有空白。

myString = myString.replaceAll("\S\s\S", "");

This will preserve white spaces when they occur more then once between two words. 当空格在两个单词之间出现的次数多于一次时，它将保留空白。

Answer 5

I know this is an old question, but for the benefit of future audiences: the concept you're looking for is "capturing groups" . 我知道这是一个古老的问题，但是为了将来的读者受益：您正在寻找的概念是“捕获群体” 。 Capturing groups allow you to refer to matches in your expression and retrieve them later, such as via a back-reference, instead of the strings being swallowed. 捕获组允许您引用表达式中的匹配项，并在以后（例如，通过反向引用）检索它们，而不是吞下字符串。

From the docs, here's the relevant syntax you need to know: 在文档中，您需要了解以下相关语法：

(?<name>X)          X, as a named-capturing group
(?:X)               X, as a non-capturing group
(?idmsuxU-idmsuxU)  Nothing, but turns match flags i d m s u x U on - off
(?idmsux-idmsux:X)  X, as a non-capturing group with the given flags i d m s u x on - off
(?=X)               X, via zero-width positive lookahead
(?!X)               X, via zero-width negative lookahead
(?<=X)              X, via zero-width positive lookbehind
(?<!X)              X, via zero-width negative lookbehind
(?>X)               X, as an independent, non-capturing group

Using the input text: 使用输入文本：

String example = "ABC     DEF     GHI J K";

You can use a positive and negative lookahead combo to combine the trailing whitespace with each word: 您可以使用正向和负向超前组合将尾随空格与每个单词组合：

// Result: [ABC     , DEF     , GHI , J , K]
example.split("(?<=\\s+)(?!\\s)");

Or you can capture on word boundaries with positive lookahead to preserve the spaces as separate, grouped elements: 或者，您可以以正向前移捕获单词边界，以将空格保留为单独的分组元素：

// Result: [ABC,      , DEF,      , GHI,  , J,  , K]
example.split("(?=\\b)");

Java Pattern API: Java模式API：
http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html

_{Side Note: While the "replace the text with something completely implausible" suggestion is tempting because it's easy, don't ever do that in production code.} _{边注：虽然“替换一些完全令人难以置信的文字”的建议很诱人，因为它很容易，永远不要做，在生产代码。} _{It will fail eventually, and it happens more often than you'd think.} _{它最终将失败，并且发生的频率超出您的想象。} _{I debugged a call center after a programmer used about 80-columns of "~=$~=$~=$..." believing that was safe.} _{在程序员使用大约80列的“〜= $〜= $〜= $ ...”后，我调试了一个呼叫中心，认为这是安全的。} _{That lasted a couple months until a service rep saved a "fancy border" on his notes with just that sequence.} _{持续了几个月，直到服务代表按此顺序在他的笔记上保存了“花哨的边框”。} _{I've even witnessed a genuine, random MD5 collision on a search server.} _{我什至在搜索服务器上目睹了一次真正的随机MD5冲突。} _{Granted, the MD5 collision took 11 years, but it still crashed the search and the point remains.} _{诚然，MD5碰撞历时11年，但仍使搜索崩溃，并且问题依然存在。} _{Unique strings never are.} _{唯一的字符串永远不会。} _{Always assume that duplicates will appear.} _{始终假定将出现重复项。}

Java字符串-在空间上拆分，但保留双倍空格

问题描述

5 个解决方案

解决方案1
3 已采纳 2012-07-03 18:22:00

解决方案2
2 2012-07-03 18:50:42

解决方案3
1 2012-07-03 18:21:13

解决方案4
1 2012-07-03 18:30:32

解决方案5
0 2015-02-26 06:17:28

Java字符串-在空间上拆分，但保留双倍空格

问题描述

5 个解决方案

解决方案1 3 已采纳 2012-07-03 18:22:00

解决方案2 2 2012-07-03 18:50:42

解决方案3 1 2012-07-03 18:21:13

解决方案4 1 2012-07-03 18:30:32

解决方案5 0 2015-02-26 06:17:28

解决方案1
3 已采纳 2012-07-03 18:22:00

解决方案2
2 2012-07-03 18:50:42

解决方案3
1 2012-07-03 18:21:13

解决方案4
1 2012-07-03 18:30:32

解决方案5
0 2015-02-26 06:17:28