简体   繁体   中英

Java regex for adjusting whitespace

I know that it is a mix of these questions: trim whitespace from a string? and Regex or way to replace multiple space with single space .

And I know one can combine both solutions:

String mystring = "   brains    T1*   C+   ";
mystring = mystring.trim().replaceAll("[ ]+", " ");

output: brains T1* C+

But my question:

Is it possible to make the trim part of one single regex?

Ie mystring.replaceAll("#regex");

Yes, you can:

mystring = mystring.replaceAll("^\\s+|(?<=\\s)\\s+|\\s+$", "");

Demo on ideone.

The idea behind this expression is to match the initial and the trailing spaces separately (by using ^\\\\s+ and \\\\s+$ expressions) and also "shielding" one space from removal by using it in the lookahead. This is the most interesting piece: (?<=\\\\s)\\\\s+ - it says "match one or more spaces when they are preceded by exactly one space". (?<=\\\\s) matches the space that we want to keep; \\\\s+ matches the "unwanted" ones.

You can use

replaceAll("(?:^\\s+|\\s+$)|\\s+(\\s)", "$1")

or simpler since we don't actually need non-capturing-group (?:XXX) here

replaceAll("^\\s+|\\s+$|\\s+(\\s)", "$1")

Idea is to replace one or more spaces that are not at start or end of your string with only one space, so we will use last space from that match (placed in group 1).

If spaces will be placed at start or end of string we don't want group 1 to contain anything so we just write them in different cases regex will need to check (we can use OR | ). Important part is to place these special cases before case with group because regex engine will try to match cases from left to right.

Demo

String mystring = "   brains    T1*   C+   ";
System.out.println(">"+mystring.replaceAll("^\\s+|\\s+$|\\s+(\\s)", "$1")+"<");

Output:

>brains T1* C+<
public static void main(String[] args) {
    // TODO Auto-generated method stub
    String mystring = "   brains    T1*   C+   ";
    System.out.println("-"+mystring.replaceAll("(^\\s+|\\s+$|(?<=\\w+.?\\s)\\s+)", "")+"-");
}

anything beginning or ending with a space (^\\s+, \\s+$) will be replaced with empty string. or anything which has one word followed by more than one space is replaced by an empty string. the "." is for holding "*".

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM