Is there an efficient algorithm for outputting all strings stored in a sorted lexicographically list that are a permutation of an input string?

Question

I would like to find the most efficient algorithm for this problem: Given a string str and a list of strings lst that consists of only lowercase English characters and is sorted lexicographically , find all the words in lst that are a permutation of str .

for example: str = "cat", lst = {"aca", "acc", "act", "cta", "tac"}

would return: {"act", "cta", "tac"}

I already have an algorithm that doesn't take advantage of the fact that lst is lexicographically ordered, and I am looking for the most efficient algorithm that takes advantage of this fact.

My algorithm goes like this:

public List<String> getPermutations(String str, List<String> lst){
  List<String> res = new ArrayList<>();
  for (String word : lst)
        if (checkPermutation(word, str))
            res.add(word);
  return res;
}


public boolean checkPermutation(String word1, String word2) {
    if (word1.length() != word2.length())
        return false;
    int[] count = new int[26];
    int i;
    for (i = 0; i < word1.length(); i++) {
        count[word1.charAt(i) - 'a']++;
        count[word2.charAt(i) - 'a']--;
    }
    for (i = 0; i < 26; i++)
        if (count[i] != 0) {
            return false;
        }
    return true;
}

Total runtime is O(NK) where N is the number of strings in lst , and k is the length of str .

Answer 1

One simple optimisation (that only becomes meaningful for really large data sets, as it doesn't really improve the O(NK):

put all the characters of your incoming str into a Set strChars
now: when iterating the words in your list: fetch the first character of each entry
if strChars.contains(charFromListEntry ): check whether it is a permutation
else: obviously, that list word can't be a permutation

Note: the sorted ordering doesn't help much here: because you still have to check the first char of the next string from your list.

There might be other checks to avoid the costly checkPermutation() run, for example to first compare the lengths of the words: when the list string is shorter than the input string, it obviously can't be a permutation of all chars.

But as said, in the end you have to iterate over all entries in your list and determine whether an entry is a permutation. There is no way avoiding the corresponding "looping". The only thing you can affect is the cost that occurs within your loop.

Finally: if your List of strings would be a Set, then you could "simply" compute all permutations of your incoming str , and check for each permutation whether it is contained in that Set. But of course, in order to turn a list into a set, you have to iterate that thing.

Answer 2

Instead of iterating over the list and checking each element for being a permutation of your string, you can iterate over all permutations of the string and check each presence in the list using binary search .

Eg

public List<String> getPermutations(String str, List<String> lst){
    List<String> res = new ArrayList<>();
    perm(str, (1L << str.length()) - 1, new StringBuilder(), lst, res);
    return res;
}

private void perm(String source, long unused,
                  StringBuilder sb, List<String> lst, List<String> result) {
    if(unused == 0) {
        int i = Collections.binarySearch(lst, sb.toString());
        if(i >= 0) result.add(lst.get(i));
    }
    for(long r = unused, l; (l = Long.highestOneBit(r)) != 0; r-=l) {
        sb.append(source.charAt(Long.numberOfTrailingZeros(l)));
        perm(source, unused & ~l, sb, lst, result);
        sb.setLength(sb.length() - 1);
    }
}

Now, the time complexity is O(K. × log N) which is not necessarily better than the O(NK) of your approach. It heavily depends on the magnitude of K and N, If the string is really short and the list really large. it may have an advantage.

There are a lot of optimizations imaginable. Eg instead constructing each permutation, followed by a binary search, each recursion step could do a partial search to identify the potential search range for the next step and skip when it's clear that the permutations can't be contained. While this could raise the performance significantly, it can't change the fundamental time complexity, ie the worst case.

Is there an efficient algorithm for outputting all strings stored in a sorted lexicographically list that are a permutation of an input string?

Question

2 answers

solution1
2 2022-03-11 13:10:07

solution2
1 2022-03-14 18:36:02

Is there an efficient algorithm for outputting all strings stored in a sorted lexicographically list that are a permutation of an input string?

Question

2 answers

solution1 2 2022-03-11 13:10:07

solution2 1 2022-03-14 18:36:02

solution1
2 2022-03-11 13:10:07

solution2
1 2022-03-14 18:36:02