简体   繁体   English

当替换文本与搜索文本重叠时,用Java替换多个子字符串

[英]Replacing multiple substrings in Java when replacement text overlaps search text

Say you have the following string: 假设您有以下字符串:

cat dog fish dog fish cat

You want to replace all cats with dogs , all dogs with fish , and all fish with cats . 要替换所有catsdogs ,所有dogsfish ,和所有的fishcats Intuitively, the expected result: 直观地说,预期结果:

dog fish cat fish cat dog

If you try the obvious solution, looping through with replaceAll() , you get: 如果您尝试显而易见的解决方案,使用replaceAll()循环,您会得到:

  1. (original) cat dog fish dog fish cat (原创) cat dog fish dog fish cat
  2. (cat -> dog) dog dog fish dog fish dog (猫 - >狗) dog dog fish dog fish dog
  3. (dog -> fish) fish fish fish fish fish fish (狗 - >鱼) fish fish fish fish fish fish
  4. (fish -> cat) cat cat cat cat cat cat (鱼 - >猫) cat cat cat cat cat cat

Clearly, this is not the intended result. 显然,这不是预期的结果。 So what's the simplest way to do this? 那么最简单的方法是什么? I can cobble something together with Pattern and Matcher (and a lot of Pattern.quote() and Matcher.quoteReplacement() ), but I refuse to believe I'm the first person to have this problem and there's no library function to solve it. 我可以用PatternMatcher (以及很多Pattern.quote()Matcher.quoteReplacement() )一起拼凑一些东西,但我拒绝相信我是第一个Matcher.quoteReplacement()这个问题的人,并且没有库函数来解决它。

(FWIW, the actual case is a bit more complicated and doesn't involve straight swaps.) (FWIW,实际案例有点复杂,不涉及直接交换。)

It seems StringUtils.replaceEach in apache commons does what you want: 似乎在apache commons中的StringUtils.replaceEach做了你想要的:

StringUtils.replaceEach("abcdeab", new String[]{"ab", "cd"}, new String[]{"cd", "ab"});
// returns "cdabecd"

Note that the documenent at the above links seems to be in error. 请注意,上述链接中的文档似乎有误。 See comments below for details. 请参阅以下评论了解详情。

String rep = str.replace("cat","§1§").replace("dog","§2§")
                .replace("fish","§3§").replace("§1§","dog")
                .replace("§2§","fish").replace("§3§","cat");

Ugly and inefficient as hell, but works. 像地狱一样丑陋和低效,但有效。


OK, here's a more elaborate and generic version. 好的,这是一个更复杂和通用的版本。 I prefer using a regular expression rather than a scanner. 我更喜欢使用正则表达式而不是扫描仪。 That way I can replace arbitrary Strings, not just words (which can be better or worse). 这样我可以替换任意字符串,而不仅仅是单词(可以更好或更差)。 Anyway, here goes: 无论如何,这里是:

public static String replace(
    final String input, final Map<String, String> replacements) {

    if (input == null || "".equals(input) || replacements == null 
        || replacements.isEmpty()) {
        return input;
    }
    StringBuilder regexBuilder = new StringBuilder();
    Iterator<String> it = replacements.keySet().iterator();
    regexBuilder.append(Pattern.quote(it.next()));
    while (it.hasNext()) {
        regexBuilder.append('|').append(Pattern.quote(it.next()));
    }
    Matcher matcher = Pattern.compile(regexBuilder.toString()).matcher(input);
    StringBuffer out = new StringBuffer(input.length() + (input.length() / 10));
    while (matcher.find()) {
        matcher.appendReplacement(out, replacements.get(matcher.group()));
    }
    matcher.appendTail(out);
    return out.toString();
}

Test Code: 测试代码:

System.out.println(replace("cat dog fish dog fish cat",
    ImmutableMap.of("cat", "dog", "dog", "fish", "fish", "cat")));

Output: 输出:

dog fish cat fish cat dog 狗鱼猫鱼猫狗

Obviously this solution only makes sense for many replacements, otherwise it's a huge overkill. 显然这个解决方案只对许多替代品有意义,否则这是一个巨大的矫枉过正。

I would create a StringBuilder and then parse the text once , one word at a time, transferring over unchanged words or changed words as I go. 我会创建一个StringBuilder然后解析文本一次一次一个字,转移不变的单词或改变单词。 I wouldn't parse it for each swap as you're suggesting. 正如你所建议的那样,我不会为每次交换解析它。

So rather than doing something like: 所以不要做类似的事情:

// pseudocode
text is new text swapping cat with dog
text is new text swapping dog with fish
text is new text swapping fish with cat

I'd do 我会做

for each word in text
   if word is cat, swap with dog
   if word is dog, swap with fish
   if word is fish, swap with cat
   transfer new word (or unchanged word) into StringBuilder.

I'd probably make a swap(...) method for this and use a HashMap for the swap. 我可能会为此创建一个swap(...)方法并使用HashMap进行交换。

For example 例如

import java.util.HashMap;
import java.util.Map;
import java.util.Scanner;

public class SwapWords {
   private static Map<String, String> myMap = new HashMap<String, String>();

   public static void main(String[] args) {
      // this would really be loaded using a file such as a text file or xml
      // or even a database:
      myMap.put("cat", "dog");
      myMap.put("dog", "fish");
      myMap.put("fish", "dog");

      String testString = "cat dog fish dog fish cat";

      StringBuilder sb = new StringBuilder();
      Scanner testScanner = new Scanner(testString);
      while (testScanner.hasNext()) {
         String text = testScanner.next();
         text = myMap.get(text) == null ? text : myMap.get(text);
         sb.append(text + " ");
      }

      System.out.println(sb.toString().trim());
   }
}
public class myreplase {
    public Map<String, String> replase;

    public myreplase() {
        replase = new HashMap<String, String>();

        replase.put("a", "Apple");
        replase.put("b", "Banana");
        replase.put("c", "Cantalope");
        replase.put("d", "Date");
        String word = "a b c d a b c d";

        String ss = "";
        Iterator<String> i = replase.keySet().iterator();
        while (i.hasNext()) {
            ss += i.next();
            if (i.hasNext()) {
                ss += "|";
            }
        }

        Pattern pattern = Pattern.compile(ss);
        StringBuilder buffer = new StringBuilder();
        for (int j = 0, k = 1; j < word.length(); j++,k++) {
            String s = word.substring(j, k);
            Matcher matcher = pattern.matcher(s);
            if (matcher.find()) {
                buffer.append(replase.get(s));
            } else {
                buffer.append(s);
            }
        }
        System.out.println(buffer.toString());
    }

    public static void main(String[] args) {
        new myreplase();
    }
}

Output :- Apple Banana Cantalope Date Apple Banana Cantalope Date 输出: - Apple Banana Cantalope Date Apple Banana Cantalope Date

Here's a method to do it without regex. 这是一种没有正则表达式的方法。

I noticed that every time a part of the string a gets replaced with b , b will always be part of the final string. 我注意到,每当字符串a的一部分被b替换时, b将始终是最终字符串的一部分。 So, you can ignore b from the string from then on. 因此,您可以从那时起忽略字符串中的b

Not only that, after replacing a with b , there will be a "space" left there. 不仅如此,更换后ab ,将有一个“空间”离开了那里。 No replacement can take place across where b is supposed to be. 不应该在b应该在哪里进行替换。

These actions add up to look a lot like split . 这些行为看起来很像split split up the values (making the "space" in between strings), do further replacements for each string in the array, then joins them back. split值(在字符串之间创建“空格”),对数组中的每个字符串进行进一步替换,然后将它们连接起来。

For example: 例如:

// Original
"cat dog fish dog fish cat"

// Replace cat with dog
{"", "dog fish dog fish", ""}.join("dog")

// Replace dog with fish
{
    "",
    {"", " fish ", " fish"}.join("fish")
    ""
}.join("dog")

// Replace fish with cat
{
    "",
    {
        "",
        {" ", " "}.join("cat"),
        {" ", ""}.join("cat")
    }.join("fish")
    ""
}.join("dog")

So far the most intuitive way (to me) is to do this is recursively: 到目前为止,最直观的方式(对我来说)是递归地执行此操作:

public static String replaceWithJointMap(String s, Map<String, String> map) {
    // Base case
    if (map.size() == 0) {
        return s;
    }

    // Get some value in the map to replace
    Map.Entry pair = map.entrySet().iterator().next();
    String replaceFrom = (String) pair.getKey();
    String replaceTo = (String) pair.getValue();

    // Split the current string with the replaceFrom string
    // Use split with -1 so that trailing empty strings are included
    String[] splitString = s.split(Pattern.quote(replaceFrom), -1);

    // Apply replacements for each of the strings in the splitString
    HashMap<String, String> replacementsLeft = new HashMap<>(map);
    replacementsLeft.remove(replaceFrom);

    for (int i=0; i<splitString.length; i++) {
        splitString[i] = replaceWithJointMap(splitString[i], replacementsLeft);
    }

    // Join back with the current replacements
    return String.join(replaceTo, splitString);
}

I don't think this is very efficient though. 我不认为这是非常有效的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM