简体   繁体   English

替换字符串中出现的所有子字符串 - 这在Java中更有效吗?

[英]Replace all occurrences of substring in a string - which is more efficient in Java?

I know of two ways of replacing all occurrences of substring in a string. 我知道有两种方法可以替换字符串中所有出现的子字符串。

The regex way (assuming "substring-to-be-replaced" doesn't include regex special chars): 正则表达式方式(假设“要替换的子字符串”不包括正则表达式特殊字符):

String regex = "substring-to-be-replaced" + "+";
Pattern scriptPattern = Pattern.compile(regex);
Matcher matcher = scriptPattern.matcher(originalstring);
newstring = matcher.replaceAll("replacement-substring");

The String.replace() way: String.replace()方式:

newstring = originalstring.replace("substring-to-be-replaced", "replacement-substring");

Which of the two is more efficient (and why)? 哪两个更有效(以及为什么)?

Are there more efficient ways than the above described two? 有没有比上述两种更有效的方法?

String.replace() uses regex underneath. String.replace()使用下面的正则表达式。

public String replace(CharSequence target, CharSequence replacement) {
      return Pattern.compile(target.toString(), Pattern.LITERAL)
             .matcher(this ).replaceAll(
               Matcher.quoteReplacement(replacement.toString()));
  }

Are there more efficient ways than the above described two? 有没有比上述两种更有效的方法?

There are given that you operate on an implementation backed eg, by an array, rather than the immutable String class (since string.replace creates a new string on each invocation). 有一个实现支持的实现,例如,由数组,而不是不可变的String类(因为string.replace在每次调用时创建一个新的字符串)。 See for instance StringBuilder.replace() . 请参阅StringBuilder.replace()

Compiling a regex incurs quite alot of overhead which is clear when observing the Pattern source code . 编译正则表达式会产生很多开销,这在观察Pattern源代码时很明显。 Luckily, Apache offers an alternative approach in StringUtils.replace() which according to the source code (line #3732) is quite efficient. 幸运的是,Apache在StringUtils.replace()提供了一种替代方法,根据源代码 (第3732行)非常有效。

Here's the source code from openjdk: 这是openjdk的源代码

public String replace(CharSequence target, CharSequence replacement) {
    return Pattern.compile(target.toString(), Pattern.LITERAL).matcher(
       this).replaceAll(Matcher.quoteReplacement(replacement.toString()));
}

而不是使用不可变的string s,使用char数组或其他一些可变类型(如StringBufferStringBuilder )。

没有进行任何分析或基准测试,我会说这是一个相当安全的赌注,如果你不需要正则表达式魔术,那么正则表达式解析器的开销(无论如何,你将获得内存方面的内容)以及CPU使用率)比你在另一端可能获得的成本高得多。

Shouldn't you compare replaceAll 2 times? 你不应该比较replaceAll 2次吗? However, for a single invocation it will hardly be measurable. 但是,对于单个调用,它几乎不可测量。 And will you do millions of comparisions? 你会做数百万次比较吗?

Then I would expect 'compile' to be faster, but only, if you don't use a constant String without any pattern-rules. 然后我希望'compile'更快,但只有,如果你不使用没有任何模式规则的常量字符串。

Where is the problem in writing a micro benchmark? 编写微基准的问题在哪里? Or look up the source. 或者查看源代码。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM