简体   繁体   中英

Java string - split on space, but preserve double space

Currently I am splitting a string by spaces. However there are some double spaces that I want to preserve when I put them all back together. Any suggestions on how to do this?

Ie the string "I went to the beach. I ate pie" is getting split as

I
went
to
the
beach.

I
ate
pie

I don't want the blank entries but I want to put it back together to the same format. Thanks all!

Do a String replaceAll(" ", " unlikelyCharacterSequence") and then split your string by spaces as normal. Then you can convert back to a double space by replacing your {unlikelyCharacterSequence} with " " at the end.

However: this will fail if you ever encounter your "unlikely" character sequence in your actual, unmodified String. For a more general purpose solution, check the alternative listed below this example.

Example (warning, depends on non-existance of !@#!@# :

String example = "Hello.  That was a double space. That was a single space."
String formatted = example.replace("  ", " !@#!@#");
String [] split = formatted.split(" ");
for(int i = 0; i < split.length; i++)
{
  split.replace("!@#!@#", " ");
}
// Recombine your splits?

Alternatively you could take a more robust strategy of recombining the string as you have it in your question but ignoring elements containing only a single space:

String example = "ThisShouldBeTwoElements.  ButItIsNot.";
String [] splitString = example.split(" ");
String recombined = "";
for(int i = 0; i < splitString.length; i++)
{
  if(!splitString[i].equals(" "))
    recombined += splitString[i];
}
String st = "I went to the beach.  I ate pie";
st.split("\\s{1}(?!\\s)");

This results in

[I, went, to, the, beach. , I, ate, pie]

I also suggest looking at http://docs.oracle.com/javase/6/docs/api/ and/or http://www.regular-expressions.info/java.html so you understand what this is doing.

Take a good look at what Java's Regex can do for you. There's a way to recongnize pattern using regex.

Java regex examples

Try this, it should remove all white spaces that are between non white space characters.

myString = myString.replaceAll("\S\s\S", "");

This will preserve white spaces when they occur more then once between two words.

I know this is an old question, but for the benefit of future audiences: the concept you're looking for is "capturing groups" . Capturing groups allow you to refer to matches in your expression and retrieve them later, such as via a back-reference, instead of the strings being swallowed.

From the docs, here's the relevant syntax you need to know:

(?<name>X)          X, as a named-capturing group
(?:X)               X, as a non-capturing group
(?idmsuxU-idmsuxU)  Nothing, but turns match flags i d m s u x U on - off
(?idmsux-idmsux:X)  X, as a non-capturing group with the given flags i d m s u x on - off
(?=X)               X, via zero-width positive lookahead
(?!X)               X, via zero-width negative lookahead
(?<=X)              X, via zero-width positive lookbehind
(?<!X)              X, via zero-width negative lookbehind
(?>X)               X, as an independent, non-capturing group

Using the input text:

String example = "ABC     DEF     GHI J K";

You can use a positive and negative lookahead combo to combine the trailing whitespace with each word:

// Result: [ABC     , DEF     , GHI , J , K]
example.split("(?<=\\s+)(?!\\s)");

Or you can capture on word boundaries with positive lookahead to preserve the spaces as separate, grouped elements:

// Result: [ABC,      , DEF,      , GHI,  , J,  , K]
example.split("(?=\\b)");

Java Pattern API:
http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html



Side Note: While the "replace the text with something completely implausible" suggestion is tempting because it's easy, don't ever do that in production code. It will fail eventually, and it happens more often than you'd think. I debugged a call center after a programmer used about 80-columns of "~=$~=$~=$..." believing that was safe. That lasted a couple months until a service rep saved a "fancy border" on his notes with just that sequence. I've even witnessed a genuine, random MD5 collision on a search server. Granted, the MD5 collision took 11 years, but it still crashed the search and the point remains. Unique strings never are. Always assume that duplicates will appear.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM