[英]Is there an efficient algorithm for outputting all strings stored in a sorted lexicographically list that are a permutation of an input string?
I would like to find the most efficient algorithm for this problem: Given a string str
and a list of strings lst
that consists of only lowercase English characters and is sorted lexicographically , find all the words in lst
that are a permutation of str
.我想为这个问题找到最有效的算法:给定一个字符串
str
和一个仅由小写英文字符组成并按字典顺序排列的字符串列表lst
,找到lst
中所有作为str
排列的单词。
for example: str
= "cat", lst
= {"aca", "acc", "act", "cta", "tac"}例如:
str
= "cat", lst
= {"aca", "acc", "act", "cta", "tac"}
would return: {"act", "cta", "tac"}会返回:{"act", "cta", "tac"}
I already have an algorithm that doesn't take advantage of the fact that lst
is lexicographically ordered, and I am looking for the most efficient algorithm that takes advantage of this fact.我已经有一个算法没有利用
lst
按字典顺序排列这一事实,我正在寻找利用这一事实的最有效算法。
My algorithm goes like this:我的算法是这样的:
public List<String> getPermutations(String str, List<String> lst){
List<String> res = new ArrayList<>();
for (String word : lst)
if (checkPermutation(word, str))
res.add(word);
return res;
}
public boolean checkPermutation(String word1, String word2) {
if (word1.length() != word2.length())
return false;
int[] count = new int[26];
int i;
for (i = 0; i < word1.length(); i++) {
count[word1.charAt(i) - 'a']++;
count[word2.charAt(i) - 'a']--;
}
for (i = 0; i < 26; i++)
if (count[i] != 0) {
return false;
}
return true;
}
Total runtime is O(NK) where N is the number of strings in lst
, and k is the length of str
.总运行时间为 O(NK),其中 N 是
lst
中的字符串数,k 是str
的长度。
One simple optimisation (that only becomes meaningful for really large data sets, as it doesn't really improve the O(NK):一个简单的优化(只对非常大的数据集有意义,因为它并没有真正改善 O(NK):
str
into a Set strChars
str
的所有字符放入 Set strChars
strChars.contains(charFromListEntry
): check whether it is a permutation strChars.contains(charFromListEntry
): 检查它是否是一个排列Note: the sorted ordering doesn't help much here: because you still have to check the first char of the next string from your list.注意:排序顺序在这里没有多大帮助:因为您仍然需要检查列表中下一个字符串的第一个字符。
There might be other checks to avoid the costly checkPermutation()
run, for example to first compare the lengths of the words: when the list string is shorter than the input string, it obviously can't be a permutation of all chars.可能还有其他检查来避免昂贵的
checkPermutation()
运行,例如首先比较单词的长度:当列表字符串比输入字符串短时,它显然不可能是所有字符的排列。
But as said, in the end you have to iterate over all entries in your list and determine whether an entry is a permutation.但如前所述,最后您必须遍历列表中的所有条目并确定一个条目是否是排列。 There is no way avoiding the corresponding "looping".
没有办法避免相应的“循环”。 The only thing you can affect is the cost that occurs within your loop.
您唯一可以影响的是循环中发生的成本。
Finally: if your List of strings would be a Set, then you could "simply" compute all permutations of your incoming str
, and check for each permutation whether it is contained in that Set.最后:如果您的字符串列表是一个集合,那么您可以“简单地”计算传入
str
的所有排列,并检查每个排列是否包含在该集合中。 But of course, in order to turn a list into a set, you have to iterate that thing.但是当然,为了将一个列表变成一个集合,你必须迭代那个东西。
Instead of iterating over the list and checking each element for being a permutation of your string, you can iterate over all permutations of the string and check each presence in the list using binary search .您可以遍历字符串的所有排列并使用二进制搜索检查列表中的每个元素,而不是遍历列表并检查每个元素是否是字符串的排列。
Eg例如
public List<String> getPermutations(String str, List<String> lst){
List<String> res = new ArrayList<>();
perm(str, (1L << str.length()) - 1, new StringBuilder(), lst, res);
return res;
}
private void perm(String source, long unused,
StringBuilder sb, List<String> lst, List<String> result) {
if(unused == 0) {
int i = Collections.binarySearch(lst, sb.toString());
if(i >= 0) result.add(lst.get(i));
}
for(long r = unused, l; (l = Long.highestOneBit(r)) != 0; r-=l) {
sb.append(source.charAt(Long.numberOfTrailingZeros(l)));
perm(source, unused & ~l, sb, lst, result);
sb.setLength(sb.length() - 1);
}
}
Now, the time complexity is O(K. × log N) which is not necessarily better than the O(NK) of your approach.现在,时间复杂度为 O(K. × log N),不一定比您的方法的 O(NK) 好。 It heavily depends on the magnitude of K and N, If the string is really short and the list really large.
它在很大程度上取决于 K 和 N 的大小,如果字符串真的很短而列表真的很大。 it may have an advantage.
它可能有一个优势。
There are a lot of optimizations imaginable.有很多可以想象的优化。 Eg instead constructing each permutation, followed by a binary search, each recursion step could do a partial search to identify the potential search range for the next step and skip when it's clear that the permutations can't be contained.
例如,代替构建每个排列,然后进行二进制搜索,每个递归步骤可以进行部分搜索以确定下一步的潜在搜索范围,并在很明显不能包含排列时跳过。 While this could raise the performance significantly, it can't change the fundamental time complexity, ie the worst case.
虽然这可以显着提高性能,但它不能改变基本的时间复杂度,即最坏的情况。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.