简体   繁体   English

为什么'replaceAll'方法不在字符串的开头添加空格?

[英]Why does the 'replaceAll' method not add an empty space at the beginning of the String?

I have a string with multiple white spaces in the beginning, middle and end: " Humpty Dumpty sat " .我有一个开头、中间和结尾有多个空格的字符串: " Humpty Dumpty sat "

I used regular expression ( https://stackoverflow.com/a/2932439/13136767 ) to remove the extra whitespaces and replace it with group 1 (which is an empty space).我使用正则表达式 ( https://stackoverflow.com/a/2932439/13136767 ) 删除多余的空格并将其替换为第 1 组(这是一个空格)。

String str = "        Humpty   Dumpty   sat  ";
str = str.replaceAll("^ +| +$|( )+", "$1");
System.out.println("[" + str + "]");

Expected Output :预期 Output

[ Humpty Dumpty sat ]

Actual Output:实际 Output:

[Humpty Dumpty sat]

A replacement string, is the text that each regular expression match is replaced with during a search-and-replace.替换字符串是在搜索和替换期间替换每个正则表达式匹配的文本。 The large whitespace at the beginning of the String should have been replaced by an empty space.字符串开头的大空格应该已替换为空格。 Why did it not add an empty space, here, at the beginning of the String?为什么不在这里,在字符串的开头添加一个空格?

A simple solution can be replacing a sequence of multiple whitespace characters with a single whitespace character.一个简单的解决方案可以是用单个空白字符替换一系列多个空白字符。

Demo:演示:

public class Main {
    public static void main(String args[]) {
        String str = "     Humpty   Dumpty   sat ";
        System.out.println("->" + str + "<-");

        str = str.replaceAll("\\s+", " ");
        System.out.println("->" + str + "<-");
    }
}

Output: Output:

->     Humpty   Dumpty   sat <-
-> Humpty Dumpty sat <-

Why did it not add an empty space, here, at the beginning of the String?为什么不在这里,在字符串的开头添加一个空格?

Because the regex you're using is specifically designed not to add spaces at the beginning or end of the string:因为您使用的正则表达式专门设计为不在字符串的开头或结尾添加空格:

str.replaceAll("^ +| +$|( )+", "$1");

Here we have three alternatives: ^ + , +$ and ( )+ .这里我们有三个选择: ^ ++$( )+ All three alternatives match one or more spaces.所有三个选项都匹配一个或多个空格。 The difference is that the first two only match at the beginning and end of the string respectively and that only the third one contains a capturing group.不同之处在于前两个分别只匹配字符串的开头和结尾,而只有第三个包含捕获组。 So if the third one is matched, ie if the sequence of spaces is not at the beginning or end of the string, the value of $1 will be a space.因此,如果第三个匹配,即如果空格序列不在字符串的开头或结尾,则$1的值将是一个空格。 Otherwise it will be empty.否则它将是空的。

The whole point of this is to not add spaces at the beginning or end.这样做的重点是不要在开头或结尾添加空格。 If you don't want this behaviour, you don't need any of this complexity.如果您不想要这种行为,则不需要任何这种复杂性。 Just replace one or more spaces with a single space and that's it.只需用一个空格替换一个或多个空格即可。

I don't know what your goal is here, but if you want to remove extra spaces only in between words, then I would suggest using lookarounds:我不知道您的目标是什么,但是如果您只想删除单词之间的多余空格,那么我建议您使用环视:

String str = "        Humpty   Dumpty   sat  ";
String output = str.replaceAll("\\b(\\w+)[ ]{2,}(?=\\w)", "$1 ");
System.out.println("|" + input + "|");
System.out.println("|" + output + "|");

This prints:这打印:

|        Humpty   Dumpty   sat  |
|        Humpty Dumpty sat  |

When replaceAll performs multiple replacements, any captures are only available if they matched during the current replacement.replaceAll执行多个替换时,任何捕获只有在当前替换期间匹配时才可用。 Captures from earlier or later matches can't be used.不能使用从较早或较晚的比赛中捕获的内容。

This means that when the spaces at the beginning and end of the string are replaced, $1 isn't available since the ( )+ alternation wasn't matched.这意味着当字符串开头和结尾的空格被替换时, $1不可用,因为( )+替换不匹配。 $1 is only available in the middle of the string when the non-anchored alternation matches.当非锚定的交替匹配时, $1仅在字符串的中间可用。

We can see this in an even simpler example:我们可以在一个更简单的例子中看到这一点:

String str = "foobar";
System.out.println(str.replaceAll("(foo)|bar", "<$1>")); 

If $1 were remembered then we'd expect to see this output:如果记住$1 ,那么我们希望看到这个 output:

<foo><foo>

It's not, though.不过,它不是。 The actual output has a blank where bar used to be:实际的 output 有一个空白,其中bar曾经是:

<foo><>

This shows that $1 is cleared after foo is matched and is empty when bar is replaced.这表明$1foo匹配后被清除,在bar被替换时为空。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM