Java string - split on space, but preserve double space

Question

Currently I am splitting a string by spaces. However there are some double spaces that I want to preserve when I put them all back together. Any suggestions on how to do this?

Ie the string "I went to the beach. I ate pie" is getting split as

I
went
to
the
beach.

I
ate
pie

I don't want the blank entries but I want to put it back together to the same format. Thanks all!

Answer 1

Do a String replaceAll(" ", " unlikelyCharacterSequence") and then split your string by spaces as normal. Then you can convert back to a double space by replacing your {unlikelyCharacterSequence} with " " at the end.

However: this will fail if you ever encounter your "unlikely" character sequence in your actual, unmodified String. For a more general purpose solution, check the alternative listed below this example.

Example (warning, depends on non-existance of !@#!@# :

String example = "Hello.  That was a double space. That was a single space."
String formatted = example.replace("  ", " !@#!@#");
String [] split = formatted.split(" ");
for(int i = 0; i < split.length; i++)
{
  split.replace("!@#!@#", " ");
}
// Recombine your splits?

Alternatively you could take a more robust strategy of recombining the string as you have it in your question but ignoring elements containing only a single space:

String example = "ThisShouldBeTwoElements.  ButItIsNot.";
String [] splitString = example.split(" ");
String recombined = "";
for(int i = 0; i < splitString.length; i++)
{
  if(!splitString[i].equals(" "))
    recombined += splitString[i];
}

Answer 2

String st = "I went to the beach.  I ate pie";
st.split("\\s{1}(?!\\s)");

This results in

[I, went, to, the, beach. , I, ate, pie]

I also suggest looking at http://docs.oracle.com/javase/6/docs/api/ and/or http://www.regular-expressions.info/java.html so you understand what this is doing.

Answer 3

Take a good look at what Java's Regex can do for you. There's a way to recongnize pattern using regex.

Java regex examples

Answer 4

Try this, it should remove all white spaces that are between non white space characters.

myString = myString.replaceAll("\S\s\S", "");

This will preserve white spaces when they occur more then once between two words.

Answer 5

I know this is an old question, but for the benefit of future audiences: the concept you're looking for is "capturing groups" . Capturing groups allow you to refer to matches in your expression and retrieve them later, such as via a back-reference, instead of the strings being swallowed.

From the docs, here's the relevant syntax you need to know:

(?<name>X)          X, as a named-capturing group
(?:X)               X, as a non-capturing group
(?idmsuxU-idmsuxU)  Nothing, but turns match flags i d m s u x U on - off
(?idmsux-idmsux:X)  X, as a non-capturing group with the given flags i d m s u x on - off
(?=X)               X, via zero-width positive lookahead
(?!X)               X, via zero-width negative lookahead
(?<=X)              X, via zero-width positive lookbehind
(?<!X)              X, via zero-width negative lookbehind
(?>X)               X, as an independent, non-capturing group

Using the input text:

String example = "ABC     DEF     GHI J K";

You can use a positive and negative lookahead combo to combine the trailing whitespace with each word:

// Result: [ABC     , DEF     , GHI , J , K]
example.split("(?<=\\s+)(?!\\s)");

Or you can capture on word boundaries with positive lookahead to preserve the spaces as separate, grouped elements:

// Result: [ABC,      , DEF,      , GHI,  , J,  , K]
example.split("(?=\\b)");

Java Pattern API:
http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html

_{Side Note: While the "replace the text with something completely implausible" suggestion is tempting because it's easy, don't ever do that in production code.} _{It will fail eventually, and it happens more often than you'd think.} _{I debugged a call center after a programmer used about 80-columns of "~=$~=$~=$..." believing that was safe.} _{That lasted a couple months until a service rep saved a "fancy border" on his notes with just that sequence.} _{I've even witnessed a genuine, random MD5 collision on a search server.} _{Granted, the MD5 collision took 11 years, but it still crashed the search and the point remains.} _{Unique strings never are.} _{Always assume that duplicates will appear.}

Java string - split on space, but preserve double space

Question

5 answers

solution1
3 ACCPTED 2012-07-03 18:22:00

solution2
2 2012-07-03 18:50:42

solution3
1 2012-07-03 18:21:13

solution4
1 2012-07-03 18:30:32

solution5
0 2015-02-26 06:17:28

Java string - split on space, but preserve double space

Question

5 answers

solution1 3 ACCPTED 2012-07-03 18:22:00

solution2 2 2012-07-03 18:50:42

solution3 1 2012-07-03 18:21:13

solution4 1 2012-07-03 18:30:32

solution5 0 2015-02-26 06:17:28

solution1
3 ACCPTED 2012-07-03 18:22:00

solution2
2 2012-07-03 18:50:42

solution3
1 2012-07-03 18:21:13

solution4
1 2012-07-03 18:30:32

solution5
0 2015-02-26 06:17:28