简体   繁体   English

是否有一种有效的算法可以输出存储在按字典顺序排列的排序列表中的所有字符串,这些字符串是输入字符串的排列?

[英]Is there an efficient algorithm for outputting all strings stored in a sorted lexicographically list that are a permutation of an input string?

I would like to find the most efficient algorithm for this problem: Given a string str and a list of strings lst that consists of only lowercase English characters and is sorted lexicographically , find all the words in lst that are a permutation of str .我想为这个问题找到最有效的算法:给定一个字符串str和一个仅由小写英文字符组成并按字典顺序排列的字符串列表lst ,找到lst中所有作为str排列的单词。

for example: str = "cat", lst = {"aca", "acc", "act", "cta", "tac"}例如: str = "cat", lst = {"aca", "acc", "act", "cta", "tac"}

would return: {"act", "cta", "tac"}会返回:{"act", "cta", "tac"}

I already have an algorithm that doesn't take advantage of the fact that lst is lexicographically ordered, and I am looking for the most efficient algorithm that takes advantage of this fact.我已经有一个算法没有利用lst按字典顺序排列这一事实,我正在寻找利用这一事实的最有效算法。

My algorithm goes like this:我的算法是这样的:

public List<String> getPermutations(String str, List<String> lst){
  List<String> res = new ArrayList<>();
  for (String word : lst)
        if (checkPermutation(word, str))
            res.add(word);
  return res;
}


public boolean checkPermutation(String word1, String word2) {
    if (word1.length() != word2.length())
        return false;
    int[] count = new int[26];
    int i;
    for (i = 0; i < word1.length(); i++) {
        count[word1.charAt(i) - 'a']++;
        count[word2.charAt(i) - 'a']--;
    }
    for (i = 0; i < 26; i++)
        if (count[i] != 0) {
            return false;
        }
    return true;
}

Total runtime is O(NK) where N is the number of strings in lst , and k is the length of str .总运行时间为 O(NK),其中 N 是lst中的字符串数,k 是str的长度。

One simple optimisation (that only becomes meaningful for really large data sets, as it doesn't really improve the O(NK):一个简单的优化(只对非常大的数据集有意义,因为它并没有真正改善 O(NK):

  • put all the characters of your incoming str into a Set strChars将传入str的所有字符放入 Set strChars
  • now: when iterating the words in your list: fetch the first character of each entry现在:迭代列表中的单词时:获取每个条目的第一个字符
  • if strChars.contains(charFromListEntry ): check whether it is a permutation if strChars.contains(charFromListEntry ): 检查它是否是一个排列
  • else: obviously, that list word can't be a permutation else:显然,那个列表词不能是一个排列

Note: the sorted ordering doesn't help much here: because you still have to check the first char of the next string from your list.注意:排序顺序在这里没有多大帮助:因为您仍然需要检查列表中下一个字符串的第一个字符。

There might be other checks to avoid the costly checkPermutation() run, for example to first compare the lengths of the words: when the list string is shorter than the input string, it obviously can't be a permutation of all chars.可能还有其他检查来避免昂贵的checkPermutation()运行,例如首先比较单词的长度:当列表字符串比输入字符串短时,它显然不可能是所有字符的排列。

But as said, in the end you have to iterate over all entries in your list and determine whether an entry is a permutation.但如前所述,最后您必须遍历列表中的所有条目并确定一个条目是否是排列。 There is no way avoiding the corresponding "looping".没有办法避免相应的“循环”。 The only thing you can affect is the cost that occurs within your loop.您唯一可以影响的是循环中发生成本。

Finally: if your List of strings would be a Set, then you could "simply" compute all permutations of your incoming str , and check for each permutation whether it is contained in that Set.最后:如果您的字符串列表是一个集合,那么您可以“简单地”计算传入str的所有排列,并检查每个排列是否包含在该集合中。 But of course, in order to turn a list into a set, you have to iterate that thing.但是当然,为了将一个列表变成一个集合,你必须迭代那个东西。

Instead of iterating over the list and checking each element for being a permutation of your string, you can iterate over all permutations of the string and check each presence in the list using binary search .您可以遍历字符串的所有排列并使用二进制搜索检查列表中的每个元素,而不是遍历列表并检查每个元素是否是字符串的排列。

Eg例如

public List<String> getPermutations(String str, List<String> lst){
    List<String> res = new ArrayList<>();
    perm(str, (1L << str.length()) - 1, new StringBuilder(), lst, res);
    return res;
}

private void perm(String source, long unused,
                  StringBuilder sb, List<String> lst, List<String> result) {
    if(unused == 0) {
        int i = Collections.binarySearch(lst, sb.toString());
        if(i >= 0) result.add(lst.get(i));
    }
    for(long r = unused, l; (l = Long.highestOneBit(r)) != 0; r-=l) {
        sb.append(source.charAt(Long.numberOfTrailingZeros(l)));
        perm(source, unused & ~l, sb, lst, result);
        sb.setLength(sb.length() - 1);
    }
}

Now, the time complexity is O(K. × log N) which is not necessarily better than the O(NK) of your approach.现在,时间复杂度为 O(K. × log N),不一定比您的方法的 O(NK) 好。 It heavily depends on the magnitude of K and N, If the string is really short and the list really large.它在很大程度上取决于 K 和 N 的大小,如果字符串真的很短而列表真的很大。 it may have an advantage.它可能有一个优势。

There are a lot of optimizations imaginable.有很多可以想象的优化。 Eg instead constructing each permutation, followed by a binary search, each recursion step could do a partial search to identify the potential search range for the next step and skip when it's clear that the permutations can't be contained.例如,代替构建每个排列,然后进行二进制搜索,每个递归步骤可以进行部分搜索以确定下一步的潜在搜索范围,并在很明显不能包含排列时跳过。 While this could raise the performance significantly, it can't change the fundamental time complexity, ie the worst case.虽然这可以显着提高性能,但它不能改变基本的时间复杂度,即最坏的情况。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 下一个按字典顺序排列的字符串的更大排列 - Next lexicographically bigger permutation of a string 有没有办法在 Java 中按字典顺序排序的字符串列表的迭代中跳过特定字符串? - Is there a way to jump over specific strings on an iteration of a list of strings that is sorted lexicographically in Java? 检查字符串数组是否按字典顺序排序,不区分大小写 - Checking if an array of string is sorted lexicographically case insensitive 将字符串插入已排序的数组字符串列表中的最有效方法是什么? - What's the most efficient way to insert a string into an already-sorted array list of strings? 字符串排列算法的复杂性 - Complexity of a string permutation algorithm 在 Java 的排序列表中搜索字符串,可能所有字符串都以部分字符串开头 - Searching for string in sorted list in Java, and possibly all strings beginning with part of string 使用回溯算法对字符串进行排列 - Permutation of string using backtracking algorithm 输入字符串的排列 - Permutation of an Input String 找到所有“字符相等”字符串的高效算法? - Efficient algorithm to find all “character-equal” strings? 是否有更清洁或更有效的方法来删除排序链表中出现在另一个排序链表中的所有值? - Is there a cleaner or more efficient way to remove all values in a sorted linked list that appear in another sorted linked list?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM