[英]Replacing multiple substrings in Java when replacement text overlaps search text
Say you have the following string: 假设您有以下字符串:
cat dog fish dog fish cat
You want to replace all cats
with dogs
, all dogs
with fish
, and all fish
with cats
. 要替换所有
cats
与dogs
,所有dogs
有fish
,和所有的fish
与cats
。 Intuitively, the expected result: 直观地说,预期结果:
dog fish cat fish cat dog
If you try the obvious solution, looping through with replaceAll()
, you get: 如果您尝试显而易见的解决方案,使用
replaceAll()
循环,您会得到:
cat dog fish dog fish cat
cat dog fish dog fish cat
dog dog fish dog fish dog
dog dog fish dog fish dog
fish fish fish fish fish fish
fish fish fish fish fish fish
cat cat cat cat cat cat
cat cat cat cat cat cat
Clearly, this is not the intended result. 显然,这不是预期的结果。 So what's the simplest way to do this?
那么最简单的方法是什么? I can cobble something together with
Pattern
and Matcher
(and a lot of Pattern.quote()
and Matcher.quoteReplacement()
), but I refuse to believe I'm the first person to have this problem and there's no library function to solve it. 我可以用
Pattern
和Matcher
(以及很多Pattern.quote()
和Matcher.quoteReplacement()
)一起拼凑一些东西,但我拒绝相信我是第一个Matcher.quoteReplacement()
这个问题的人,并且没有库函数来解决它。
(FWIW, the actual case is a bit more complicated and doesn't involve straight swaps.) (FWIW,实际案例有点复杂,不涉及直接交换。)
It seems StringUtils.replaceEach in apache commons does what you want: 似乎在apache commons中的StringUtils.replaceEach做了你想要的:
StringUtils.replaceEach("abcdeab", new String[]{"ab", "cd"}, new String[]{"cd", "ab"});
// returns "cdabecd"
Note that the documenent at the above links seems to be in error. 请注意,上述链接中的文档似乎有误。 See comments below for details.
请参阅以下评论了解详情。
String rep = str.replace("cat","§1§").replace("dog","§2§")
.replace("fish","§3§").replace("§1§","dog")
.replace("§2§","fish").replace("§3§","cat");
Ugly and inefficient as hell, but works. 像地狱一样丑陋和低效,但有效。
OK, here's a more elaborate and generic version. 好的,这是一个更复杂和通用的版本。 I prefer using a regular expression rather than a scanner.
我更喜欢使用正则表达式而不是扫描仪。 That way I can replace arbitrary Strings, not just words (which can be better or worse).
这样我可以替换任意字符串,而不仅仅是单词(可以更好或更差)。 Anyway, here goes:
无论如何,这里是:
public static String replace(
final String input, final Map<String, String> replacements) {
if (input == null || "".equals(input) || replacements == null
|| replacements.isEmpty()) {
return input;
}
StringBuilder regexBuilder = new StringBuilder();
Iterator<String> it = replacements.keySet().iterator();
regexBuilder.append(Pattern.quote(it.next()));
while (it.hasNext()) {
regexBuilder.append('|').append(Pattern.quote(it.next()));
}
Matcher matcher = Pattern.compile(regexBuilder.toString()).matcher(input);
StringBuffer out = new StringBuffer(input.length() + (input.length() / 10));
while (matcher.find()) {
matcher.appendReplacement(out, replacements.get(matcher.group()));
}
matcher.appendTail(out);
return out.toString();
}
Test Code: 测试代码:
System.out.println(replace("cat dog fish dog fish cat",
ImmutableMap.of("cat", "dog", "dog", "fish", "fish", "cat")));
Output: 输出:
dog fish cat fish cat dog
狗鱼猫鱼猫狗
Obviously this solution only makes sense for many replacements, otherwise it's a huge overkill. 显然这个解决方案只对许多替代品有意义,否则这是一个巨大的矫枉过正。
I would create a StringBuilder and then parse the text once , one word at a time, transferring over unchanged words or changed words as I go. 我会创建一个StringBuilder然后解析文本一次 , 一次一个字,转移不变的单词或改变单词。 I wouldn't parse it for each swap as you're suggesting.
正如你所建议的那样,我不会为每次交换解析它。
So rather than doing something like: 所以不要做类似的事情:
// pseudocode
text is new text swapping cat with dog
text is new text swapping dog with fish
text is new text swapping fish with cat
I'd do 我会做
for each word in text
if word is cat, swap with dog
if word is dog, swap with fish
if word is fish, swap with cat
transfer new word (or unchanged word) into StringBuilder.
I'd probably make a swap(...) method for this and use a HashMap for the swap. 我可能会为此创建一个swap(...)方法并使用HashMap进行交换。
For example 例如
import java.util.HashMap;
import java.util.Map;
import java.util.Scanner;
public class SwapWords {
private static Map<String, String> myMap = new HashMap<String, String>();
public static void main(String[] args) {
// this would really be loaded using a file such as a text file or xml
// or even a database:
myMap.put("cat", "dog");
myMap.put("dog", "fish");
myMap.put("fish", "dog");
String testString = "cat dog fish dog fish cat";
StringBuilder sb = new StringBuilder();
Scanner testScanner = new Scanner(testString);
while (testScanner.hasNext()) {
String text = testScanner.next();
text = myMap.get(text) == null ? text : myMap.get(text);
sb.append(text + " ");
}
System.out.println(sb.toString().trim());
}
}
public class myreplase {
public Map<String, String> replase;
public myreplase() {
replase = new HashMap<String, String>();
replase.put("a", "Apple");
replase.put("b", "Banana");
replase.put("c", "Cantalope");
replase.put("d", "Date");
String word = "a b c d a b c d";
String ss = "";
Iterator<String> i = replase.keySet().iterator();
while (i.hasNext()) {
ss += i.next();
if (i.hasNext()) {
ss += "|";
}
}
Pattern pattern = Pattern.compile(ss);
StringBuilder buffer = new StringBuilder();
for (int j = 0, k = 1; j < word.length(); j++,k++) {
String s = word.substring(j, k);
Matcher matcher = pattern.matcher(s);
if (matcher.find()) {
buffer.append(replase.get(s));
} else {
buffer.append(s);
}
}
System.out.println(buffer.toString());
}
public static void main(String[] args) {
new myreplase();
}
}
Output :- Apple Banana Cantalope Date Apple Banana Cantalope Date 输出: - Apple Banana Cantalope Date Apple Banana Cantalope Date
Here's a method to do it without regex. 这是一种没有正则表达式的方法。
I noticed that every time a part of the string a
gets replaced with b
, b
will always be part of the final string. 我注意到,每当字符串
a
的一部分被b
替换时, b
将始终是最终字符串的一部分。 So, you can ignore b
from the string from then on. 因此,您可以从那时起忽略字符串中的
b
。
Not only that, after replacing a
with b
, there will be a "space" left there. 不仅如此,更换后
a
与b
,将有一个“空间”离开了那里。 No replacement can take place across where b
is supposed to be. 不应该在
b
应该在哪里进行替换。
These actions add up to look a lot like split
. 这些行为看起来很像
split
。 split
up the values (making the "space" in between strings), do further replacements for each string in the array, then joins them back. split
值(在字符串之间创建“空格”),对数组中的每个字符串进行进一步替换,然后将它们连接起来。
For example: 例如:
// Original
"cat dog fish dog fish cat"
// Replace cat with dog
{"", "dog fish dog fish", ""}.join("dog")
// Replace dog with fish
{
"",
{"", " fish ", " fish"}.join("fish")
""
}.join("dog")
// Replace fish with cat
{
"",
{
"",
{" ", " "}.join("cat"),
{" ", ""}.join("cat")
}.join("fish")
""
}.join("dog")
So far the most intuitive way (to me) is to do this is recursively: 到目前为止,最直观的方式(对我来说)是递归地执行此操作:
public static String replaceWithJointMap(String s, Map<String, String> map) {
// Base case
if (map.size() == 0) {
return s;
}
// Get some value in the map to replace
Map.Entry pair = map.entrySet().iterator().next();
String replaceFrom = (String) pair.getKey();
String replaceTo = (String) pair.getValue();
// Split the current string with the replaceFrom string
// Use split with -1 so that trailing empty strings are included
String[] splitString = s.split(Pattern.quote(replaceFrom), -1);
// Apply replacements for each of the strings in the splitString
HashMap<String, String> replacementsLeft = new HashMap<>(map);
replacementsLeft.remove(replaceFrom);
for (int i=0; i<splitString.length; i++) {
splitString[i] = replaceWithJointMap(splitString[i], replacementsLeft);
}
// Join back with the current replacements
return String.join(replaceTo, splitString);
}
I don't think this is very efficient though. 我不认为这是非常有效的。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.