简体   繁体   中英

Remove all whitespaces from String but keep ONE newline

I have this input String (containg tabs, spaces, linebreaks):


        That      is a test.              
    seems to work       pretty good? working.








    Another test  again.

[Edit]: I should have provided the String for better testing as stackoverflow removes all special characters (tabs, ...)

String testContent = "\n\t\n\t\t\t\n\t\t\tDas      ist ein Test.\t\t\t  \n\tsoweit scheint das \t\tganze zu? funktionieren.\n\n\n\n\t\t\n\t\t\n\t\t\t      \n\t\t\t      \n    \t\t\t\n    \tNoch ein  Test.\n    \t\n    \t\n    \t";

And I want to reach this state:


That is a test.
seems to work pretty good? working.
Another test again.

String expectedOutput = "Das ist ein Test.\nsoweit scheint das ganze zu? funktionieren.\nNoch ein Test.\n";

Any ideas? Can this be achieved using regexes?

replaceAll("\\\\s+", " ") is NOT what I'm looking for. If this regex would preserve exactly 1 newline of the ones existing it would be perfect.

I have tried this but this seems suboptimal to me...:

BufferedReader bufReader = new BufferedReader(new StringReader(testContent));
String line = null;
StringBuilder newString = new StringBuilder();
while ((line = bufReader.readLine()) != null) {
    String temp = line.replaceAll("\\s+", " ");
    if (!temp.trim().equals("")) {
        newString.append(temp.trim());
        newString.append("\n");
    }
}

In a single regex (plus a small patch for tabs):

input.replaceAll("^\\s+|\\s+$|\\s*(\n)\\s*|(\\s)\\s*", "$1$2")
     .replace("\t"," ");

The regex looks daunting, but in fact decomposes nicely into these parts that are OR-ed together:

  • ^\\s+ – match whitespace at the beginning;
  • \\s+$ – match whitespace at the end;
  • \\s*(\\n)\\s* – match whitespace containing a newline, and capture that newline;
  • (\\s)\\s* – match whitespace, capturing the first whitespace character.

The result will be a match with two capture groups, but only one of the groups may be non-empty at a time. This allows me to replace the match with "$1$2" , which means "concatenate the two capture groups."

The only remaining problem is that I can't replace a tab with a space using this approach, so I fix that up with a simple non-regex character replacement.

In 4 steps:

text
    // 1. compress all non-newline whitespaces to single space
    .replaceAll("[\\s&&[^\\n]]+", " ")
    // 2. remove spaces from begining or end of lines
    .replaceAll("(?m)^\\s|\\s$", "")
    // 3. compress multiple newlines to single newlines
    .replaceAll("\\n+", "\n")
    // 4. remove newlines from begining or end of string
    .replaceAll("^\n|\n$", "") 

If I understand correctly, you simply want to replace a succession of newlines with one newline. So replace \\n\\n* with \\n (with appropriate flags). If there is a lot of whitespace in the lines, simply remove the whitespace ( ^\\s\\s*$ with multiline mode) first, then replace the newlines.

Edit: The only issue here is that some newlines might remain here and there, so you have to be careful to first collapse spaces, then fix the empty line problem. You can trim it down further into probably a single regex, but it's easier to read with these three:

 Pattern spaces = Pattern.compile("[\t ]+");
 Pattern emptyLines = Pattern.compile("^\\s+$?", Pattern.MULTILINE);
 Pattern newlines = Pattern.compile("\\s*\\n+");
 System.out.print(
      newlines.matcher(emptyLines.matcher(spaces.matcher(
        input).replaceAll(" ")).replaceAll("")).replaceAll("\n"));

First replace all new lines with one new line , then replace the spaces but not new lines , last thing, you should remove all white spaces from the beginning of the string:

String test = "      This is              a real\n\n\n\n\n\n\n\n\n test !!\n\n\n   bye";
test = test.replaceAll("\n+", "\n");
test = test.replaceAll("((?!\n+)\\s+)", " ");
test = test.replaceAll("((?!\n+)\\s+)", "");

Output:

This is a real
test !!
bye

Why don't you do

String[] lines = split(s,"\n")
String[] noExtraSpaces = removeSpacesInEachLine(lines)
String result = join(noExtraSpaces,"\n")

Don't forget https://softwareengineering.stackexchange.com/questions/10998/what-does-the-jamie-zawinskis-quotation-about-regular-expressions-mean

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM