使用给定的单词列表重新创建给定字符串的方法数

Question

给定的是一个字符串word和一个包含一些字符串的字符串数组book 。 程序应该给出仅使用book元素创建word的可能性数量。 一个元素可以根据需要多次使用，并且程序必须在 6 秒内终止。

例如，输入：

String word = "stackoverflow";

String[] book = new String[9];
book[0] = "st";
book[1] = "ck";
book[2] = "CAG";
book[3] = "low";
book[4] = "TC";
book[5] = "rf";
book[6] = "ove";
book[7] = "a";
book[8] = "sta";

输出应该是2 ，因为我们可以通过两种方式创建"stackoverflow" ：

1： "st" + "a" + "ck" + "ove" + "rf" + "low"

2： "sta" + "ck" + "ove" + "rf" + "low"

如果word相对较小（<15 个字符），我的程序实现仅在所需的时间内终止。 然而，正如我之前提到的，程序的运行时间限制为6秒，它应该能够处理非常大的word字符串（> 1000个字符）。 这是一个大输入的例子。

这是我的代码：

1）实际方法：

输入：一个字符串word和一个 String[] book

输出：仅使用 book 中的字符串可以编写单词的方式数

public static int optimal(String word, String[] book){
    int count = 0;

    List<List<String>> allCombinations = allSubstrings(word);

    List<String> empty = new ArrayList<>();

    List<String> wordList = Arrays.asList(book);

    for (int i = 0; i < allCombinations.size(); i++) {

        allCombinations.get(i).retainAll(wordList);

        if (!sumUp(allCombinations.get(i), word)) {
            allCombinations.remove(i);
            allCombinations.add(i, empty);
        }
        else count++;
    }

    return count;
}

2) allSubstrings():

输入：一个字符串input

输出：一个列表列表，每个列表包含加起来为输入的子串的组合

static List<List<String>> allSubstrings(String input) {

    if (input.length() == 1) return Collections.singletonList(Collections.singletonList(input));

    List<List<String>> result = new ArrayList<>();

    for (List<String> temp : allSubstrings(input.substring(1))) {

        List<String> firstList = new ArrayList<>(temp);
        firstList.set(0, input.charAt(0) + firstList.get(0));
        if (input.startsWith(firstList.get(0), 0)) result.add(firstList);

        List<String> l = new ArrayList<>(temp);
        l.add(0, input.substring(0, 1));
        if (input.startsWith(l.get(0), 0)) result.add(l);
    }

    return result;
}

3.) 总结()：

输入：一个字符串列表input和一个expected的字符串

输出：如果input的元素加起来符合expected则为真

public static boolean sumUp (List<String> input, String expected) {

    String x = "";

    for (int i = 0; i < input.size(); i++) {
        x = x + input.get(i);
    }
    if (expected.equals(x)) return true;
    return false;
}

Answer 1

我已经弄清楚我在之前的回答中做错了什么：我没有使用记忆，所以我正在重做大量不必要的工作。

考虑一个 book 数组{"a", "aa", "aaa"}和一个目标词"aaa" 。 有四种方法可以构建这个目标：

"a" + "a" + "a"
"aa" + "a"
"a" + "aa"
"aaa"

我之前的尝试将分别遍历所有四个。 但相反，人们可以观察到：

有 1 种方法可以构造"a"
您可以通过两种方式构造"aa" ， "a" + "a"或直接使用"aa" 。
您可以直接使用"aaa"来构造"aaa" （1种方式）； 或"aa" + "a" （2 种方式，因为有 2 种方法可以构造"aa" ）； 或"a" + "aa" （1 种方式）。

请注意，此处的第三步仅向先前构造的字符串添加一个额外的字符串，为此我们知道可以构造它的方式的数量。

这表明，如果我们计算可以构造word前缀的方式的数量，我们可以使用它来简单地计算出更长的前缀的方式数量，方法是从book再添加一个字符串。

我定义了一个简单的 trie 类，因此您可以快速查找与word中任何给定位置匹配的book单词的前缀：

class TrieNode {
  boolean word;
  Map<Character, TrieNode> children = new HashMap<>();

  void add(String s, int i) {
    if (i == s.length()) {
      word = true;
    } else {
      children.computeIfAbsent(s.charAt(i), k -> new TrieNode()).add(s, i + 1);
    }
  }
}

对于s每个字母，这会创建一个TrieNode实例，并为后续字符等存储TrieNode 。

static long method(String word, String[] book) {
  // Construct a trie from all the words in book.
  TrieNode t = new TrieNode();
  for (String b : book) {
    t.add(b, 0);
  }

  // Construct an array to memoize the number of ways to construct
  // prefixes of a given length: result[i] is the number of ways to
  // construct a prefix of length i.
  long[] result = new long[word.length() + 1];

  // There is only 1 way to construct a prefix of length zero.
  result[0] = 1;

  for (int m = 0; m < word.length(); ++m) {
    if (result[m] == 0) {
      // If there are no ways to construct a prefix of this length,
      // then just skip it.
      continue;
    }

    // Walk the trie, taking the branch which matches the character
    // of word at position (n + m).
    TrieNode tt = t;
    for (int n = 0; tt != null && n + m <= word.length(); ++n) {
      if (tt.word) {
        // We have reached the end of a word: we can reach a prefix
        // of length (n + m) from a prefix of length (m).
        // Increment the number of ways to reach (n+m) by the number
        // of ways to reach (m).
        // (Increment, because there may be other ways).
        result[n + m] += result[m];
        if (n + m == word.length()) {
          break;
        } 
      }
      tt = tt.children.get(word.charAt(n + m));
    }
  }

  // The number of ways to reach a prefix of length (word.length())
  // is now stored in the last element of the array.
  return result[word.length()];
}

对于OP 给出的非常长的输入，这给出了输出：

$ time java Ideone

2217093120

real    0m0.126s
user    0m0.146s
sys 0m0.036s

比所需的 6 秒快很多 - 这也包括 JVM 启动时间。

编辑：事实上，trie 是没有必要的。 您可以简单地将“Walk the trie”循环替换为：

for (String b : book) {
  if (word.regionMatches(m, b, 0, b.length())) {
    result[m + b.length()] += result[m];
  }
}

它的执行速度较慢，但仍比 6s 快得多：

2217093120

real    0m0.173s
user    0m0.226s
sys 0m0.033s

Answer 2

一些观察：

x = x + input.get(i);

当您循环时，使用 String+ 不是一个好主意。 使用 StringBuilder 并附加到循环中，最后return builder.toString() 。 或者你遵循安迪的想法。 不需要合并字符串，你已经知道目标词了。 见下文。

那么： List意味着添加/删除元素可能代价高昂。 所以看看你是否可以摆脱那部分，如果可以使用地图，集合代替。

最后：真正的重点是研究你的算法。 我会尝试“向后”工作。 含义：首先识别那些实际出现在你的目标词中的数组元素。 您可以从一开始就忽略所有其他人。

然后：查看所有 **start*+ 搜索词的数组条目。 在您的示例中，您可以注意到只有两个适合的数组元素。 然后从那里开始工作。

Answer 3

我的第一个观察是，您实际上不需要构建任何东西：您知道要构建的字符串（例如stackoverflow ），因此您真正需要跟踪的是到目前为止您匹配了多少该字符串. 称其为m 。

接下来，匹配m字符，提供m < word.length() ，您需要从book选择一个匹配从m到m + nextString.length()的word部分的下一个字符串。

您可以通过依次检查每个字符串来做到这一点：

if (word.matches(m, nextString, 0, nextString.length()) { ...}

但是您可以通过提前确定无法匹配的字符串来做得更好：您附加的下一个字符串将具有以下属性：

word.charAt(m) == nextString.charAt(0) （下一个字符匹配）
m + nextString.length() <= word.length() （添加下一个字符串不应使构造的字符串长于word ）

因此，您可以通过构建以字母开头的单词的字母映射（第 1 点）来减少可能检查的书籍中的潜在单词； 如果您以递增的长度顺序存储具有相同起始字母的单词，则一旦长度变得太大，您就可以停止检查该字母（第 2 点）。

您可以构建一次地图并重复使用：

Map<Character, List<String>> prefixMap =
    Arrays.asList(book).stream()
        .collect(groupingBy(
            s -> s.charAt(0),
            collectingAndThen(
                toList(),
                ss -> {
                  ss.sort(comparingInt(String::length));
                  return ss;
                })));

您可以递归计算方式的数量，而无需构造任何额外的对象 (*)：

int method(String word, String[] book) {
  return method(word, 0, /* construct map as above */);
}

int method(String word, int m, Map<Character, List<String>> prefixMap) {
  if (m == word.length()) {
    return 1;
  }

  int result = 0;
  for (String nextString : prefixMap.getOrDefault(word.charAt(m), emptyList())) {
    if (m + nextString.length() > word.length()) {
      break;
    }

    // Start at m+1, because you already know they match at m.
    if (word.regionMatches(m + 1, nextString, 1, nextString.length()-1)) {
      // This is a potential match!
      // Make a recursive call.
      result += method(word, m + nextString.length(), prefixMap);
    }
  }
  return result;
}

(*) 这可能会构造Character新实例，因为word.charAt(m)的word.charAt(m) ：保证缓存的实例仅用于 0-127 范围内的字符。 有一些方法可以解决这个问题，但它们只会使代码变得混乱。

Answer 4

我认为您在优化应用程序方面已经做得很好。 除了GhostCat的回答之外，这里还有一些我自己的建议：

public static int optimal(String word, String[] book){

    int count = 0;

    List<List<String>> allCombinations = allSubstrings(word);
    List<String> wordList = Arrays.asList(book);

    for (int i = 0; i < allCombinations.size(); i++)
    {
        /*
         * allCombinations.get(i).retainAll(wordList);
         * 
         * There is no need to retrieve the list element
         * twice, just set it in a local variable
         */
        java.util.List<String> combination = allCombinations.get(i);
        combination.retainAll(wordList);
        /*
         * Since we are only interested in the count here
         * there is no need to remove and add list elements
         */
        if (sumUp(combination, word)) 
        {
            /*allCombinations.remove(i);
            allCombinations.add(i, empty);*/
            count++;
        }
        /*else count++;*/
    }
    return count;
}

public static boolean sumUp (List<String> input, String expected) {

    String x = "";

    for (int i = 0; i < input.size(); i++) {
        x = x + input.get(i);
    }
    // No need for if block here, just return comparison result
    /*if (expected.equals(x)) return true;
    return false;*/
    return expected.equals(x);
}

由于您有兴趣查看方法的执行时间，我建议您实施某种基准测试系统。 这是一个快速模型：

private static long benchmarkOptima(int cycles, String word, String[] book) {

    long totalTime = 0;
    for (int i = 0; i < cycles; i++)
    {
        long startTime = System.currentTimeMillis();

        int a = optimal(word, book);

        long executionTime = System.currentTimeMillis() - startTime;
        totalTime += executionTime;
    }
    return totalTime / cycles;
}

public static void main(String[] args)
{
    String word = "stackoverflow";
    String[] book = new String[] {
            "st", "ck", "CAG", "low", "TC",
            "rf", "ove", "a", "sta"
    };

    int result = optimal(word, book);

    final int cycles = 50;
    long averageTime = benchmarkOptima(cycles, word, book);

    System.out.println("Optimal result: " + result);
    System.out.println("Average execution time - " + averageTime + " ms");
}

输出

2
Average execution time - 6 ms

Answer 5

注意：实现卡在@user1221 提到的测试用例中，正在处理它。

我能想到的是一种基于Trie的方法，它是O(sum of length of words in dict)空间。 时间不是最佳的。

程序：

构建字典中所有单词的 Trie。 这是一个预处理任务，需要O(sum of lengths of all strings in dict) 。
我们尝试找到您想要在特里制作的字符串，并进行扭曲。 我们从搜索字符串的前缀开始。 如果我们在树中得到一个前缀，我们从顶部开始递归搜索并继续寻找更多的前缀。
当我们到达输出字符串的末尾即stackoverflow ，我们检查是否到达任何字符串的末尾，如果是，则我们到达了该字符串的有效组合。 我们在返回递归时计算这个。

例如：在上面的例子中，我们使用字典作为{"st", "sta", "a", "ck"}我们构造我们的特里树（ $是哨兵字符，即不在字典中的字符）：

$___s___t.___a.
|___a.
|___c___k.

的. 表示 dict 中的单词在该位置结束。 我们试图找到stack的构造数。

我们开始在树中搜索stack 。

depth=0
$___s(*)___t.___a.
|___a.
|___c___k.

我们看到我们在一个单词的末尾，我们从顶部开始使用剩余的字符串ack开始新的搜索。

depth=0
$___s___t(*).___a.
|___a.
|___c___k.

我们再次处于字典中一个词的末尾。 我们开始新的搜索ck 。

depth=1
$___s___t.___a.
|___a(*).
|___c___k.

depth=2
$___s___t.___a.
|___a.
|___c(*)___k.

我们到达 dict 中stack末尾和单词的末尾，因此我们有 1 个有效的stack表示。

depth=2
$___s___t.___a.
|___a.
|___c___k(*).

我们回到depth=2的调用者

没有下一个字符可用，我们返回到depth=1的调用者。

depth=1
$___s___t.___a.
|___a(*, 1).
|___c___k.

depth=0
$___s___t(*, 1).___a.
|___a.
|___c___k.

我们移动到下一个字符。 我们看到我们到达了 dict 中一个单词的末尾，我们在 dict 中启动了对ck的新搜索。

depth=0
$___s___t.___a(*, 1).
|___a.
|___c___k.

depth=1
$___s___t.___a.
|___a.
|___c(*)___k.

我们到达stack的末尾并在 dict 中工作，因此是另一个有效的表示。 我们回到depth=1的调用者

depth=1
$___s___t.___a.
|___a.
|___c___k(*, 1).

没有更多的字符要处理，我们返回结果2 。

depth=0
$___s___t.___a(*, 2).
|___a.
|___c___k.

注意：该实现是在 C++ 中实现的，转换为 Java 应该不会太难，并且这个实现假设所有字符都是小写的，将它扩展到这两种情况是微不足道的。

示例代码（完整版）：

/**
Node *base: head of the trie
Node *h   : current node in the trie
string s  : string to search
int idx   : the current position in the string
*/
int count(Node *base, Node *h, string s, int idx) {
    // step 3: found a valid combination.
    if (idx == s.size()) return h->end;

    int res = 0;
    // step 2: we recursively start a new search.
    if (h->end) {
        res += count(base, base, s, idx);
    }
    // move ahead in the trie.
    if (h->next[s[idx] - 'a'] != NULL) { 
        res += count(base, h->next[s[idx] - 'a'], s, idx + 1);
    }

    return res;
}

使用给定的单词列表重新创建给定字符串的方法数

问题描述

5 个解决方案

解决方案1
3 已采纳 2019-06-02 19:41:32

解决方案2
1 2019-06-02 10:17:28

解决方案3
1 2019-06-02 10:56:44

解决方案4
0 2019-06-02 11:11:10

解决方案5
0 2019-06-02 19:17:07

使用给定的单词列表重新创建给定字符串的方法数

问题描述

5 个解决方案

解决方案1 3 已采纳 2019-06-02 19:41:32

解决方案2 1 2019-06-02 10:17:28

解决方案3 1 2019-06-02 10:56:44

解决方案4 0 2019-06-02 11:11:10

解决方案5 0 2019-06-02 19:17:07

解决方案1
3 已采纳 2019-06-02 19:41:32

解决方案2
1 2019-06-02 10:17:28

解决方案3
1 2019-06-02 10:56:44

解决方案4
0 2019-06-02 11:11:10

解决方案5
0 2019-06-02 19:17:07