简体   繁体   中英

Using Java's replaceAll to replace the whole string

I am trying to use following code to replace the whole string:

Code: String a = "Hello"; String b = a.replaceAll("(?s).*", "US"); String a = "Hello"; String b = a.replaceAll("(?s).*", "US"); Output:

USUS

Question: Why is the string "US" repeated twice? How can I replace the whole string using replaceAll function, making use of regular expression?

Why I need to do this: I need to pick up replace patterns specified in a json file with the values given there. In this model, I wanted to give independence to the user(json configurer) to define a pattern such that entire string may be replaced, without me having to code special handling of string replacement.

It's because of how the Matcher class handles patterns that could match an empty string. The replaceAll method of String is defined to work the same way as the replaceAll method of Matcher , which works like this:

This method first resets this matcher. It then scans the input sequence looking for a match of the pattern. Characters that are not part of the match are appended directly to the result string; the match is replaced in the result by the replacement string. The replacement string may contain references to captured subsequences as in the appendReplacement method.

When the matcher tries to find a pattern, if the subsequence in the source is the empty string, the matcher returns the empty string but then bumps up the current index by 1, so that it does not return an infinite loop of empty strings. So here's how it operates on "Hello" :

1) The matcher looks for .* . Since this is a greedy match, matching as many characters as possible, it will find the substring "Hello" , and uses that, replacing it with "US" . The current index is then positioned after the 'o' .

2) The matcher looks for .* again. Since it's at the end of the input, but the pattern is allowed to match an empty string, it matches the empty string and replaces that with another "US" . But then it bumps up the current index, which is now in a position past the end of the source.

3) The matcher looks for .* again, but since the current index is past the end of the source, it won't find anything.

To get a feel for how it operates, try using ".*?" as the pattern. Now, the matcher will always use an empty string, because the ? tells it to use the shortest string possible. It also increases the current index by 1 each time it finds an empty string. The result:

a.replaceAll("(?s).*?", ".-")  //returns
".-H.-e.-l.-l.-o.-"

That is, it replaces all the empty strings between each pair of characters with ".-" , and leaves the actual characters alone.

Moral: Be really careful with patterns that could match empty strings.

After reading your comment, where you indicate that the pattern could be input by the user, I think you could use this as a test to see if the pattern could match the empty string: 阅读您的评论后,您指出该模式可以由用户输入,我认为您可以使用它作为测试,以查看该模式是否可以匹配空字符串:

if ("".matches(inputPattern)) {
    // ???
}

I'm not sure what you'd do with it. Perhaps it's always the case that if this is true, your replaceAll will add an extra US at the end and you can safely delete it. Or maybe you can just tell them to try a different pattern.

I'm not sure where this behavior of the matcher (ie increasing the current index by 1 when the match is an empty string) is documented. I didn't see it in the Matcher javadoc. I suppose that means that a future version of the JRE could behave differently, although this seems highly unlikely.

It's because .* can match an empty string. so the first match is all the string (from the begining) and the second is the empty string (from the last position of the string after the last character)

You can avoid this behaviour using the + quantifier instead of * . But it will not replace an empty string.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM