Find all the words in the list that differ by a single letter

Question

For any given word w in a list words , I want to find all the other words in the list that can become w by changing a single letter in them. All words are of equal length, and only substitution is allowed. Call this function parent(w) .

For example, given words = ["hot","dot","dog","lot","log","cog"] , parent("cog") would be ["dog", "log"] . parent("lot") would be ["dot", "hot", "log"] etc.

To do this, I first build a reverse index where the keys (str, int) map to the words that have character str at index int . Then, finding the parents of a word becomes the task of intersecting all the words that have the same letters as the word in the same positions, except for one.

The code is as follows, which produces an empty set. Why is it not working?

from typing import Iterator, Dict, Tuple, Set
import itertools

graph: Dict[Tuple[str, int], Set[int]] = dict()

for i, word in enumerate(words):
    for j, ch in enumerate(word):
        if (ch, j) not in graph:
            graph[(ch, j)] = set()

        graph[(ch, j)].add(i)

def parents(word: str) -> Iterator[int]:
    n: int = len(word)
    s: Set[int] = set()
    for part in itertools.combinations(range(n), n - 1):
        keys = map(lambda x: (word[x], x), part)
        existing_keys = filter(lambda k: k in graph, keys)
        for y in itertools.chain(map(lambda k: graph[k], existing_keys)):
            s = s.intersection(set(y)) if s else set(y)

    return filter(lambda i: words[i] != word, s)

print(list(parents("cog"))) # empty!!!

Answer 1

Your solution is almost there. The problem is that you're intersecting everything you find. But instead you should append your results for each combination. Move s: Set[int] = set() inside your first for loop, and append your results after the second for loop and it'll work. Something like this:

def parents(word: str) -> Set[int]:
    ret: Set[int] = set()
    for part in itertools.combinations(range(n), n - 1):
        keys = map(lambda x: (word[x], x), part)
        existing_keys = filter(lambda k: k in graph, keys)
        s: Set[int] = set()
        for y in map(lambda k: graph[k], existing_keys):
            s = s.intersection(set(y)) if s else set(y)

        ret.update(filter(lambda i: words[i] != word, s))

    return ret

Answer 2

The Levenshtein distance algorithm will achive what you are looking for.

from Levenshtein import distance  # pip install python-Levenshtein

words = ["hot", "dot", "dog", "lot", "log", "cog"]
parent = 'cog'
# find all words matching with one substitution
edits = [w for w in words if distance(parent, w) == 1]
print(edits)

Output:

['dog', 'log']

If you don't want to install any libraries, there are good online resources with Python implementations of the algorithm.

Answer 3

A very simple solution. A different approach.

Complexity: O(N * 26) => O(N) - where N is the number of characters in each word.

def main(words, word):
    words = set(words)
    res = []
    for i, _ in enumerate(word):
        for c in 'abcdefghijklmnopqrstuvwxyz':
            w = word[:i] + c + word[i+1:]
            if w != word and w in words:
                res.append(w)
    return res


print(main(["hot","dot","dog","lot","log","cog"], "cog"))
# ['dog', 'log']

Instead of iterating over all the alphabets, you can also choose to only iterate on the alphabets that are occurring in the list using:

{letter for w in words for letter in w}

Answer 4

I would check every letter of the parent word w against each of the words from the list using the Python in function.

For example for parent("cog") against the list of words:

["hot","dot","dog","lot","log","cog"]

yields:

[1, 1, 2, 1, 2, 3]

Numbers 2 show the correct words: dog and log.

Find all the words in the list that differ by a single letter

Question

4 answers

solution1
2 2020-08-21 23:26:33

solution2
1 2020-08-21 23:14:22

solution3
1 ACCPTED 2020-08-21 23:27:13

solution4
0 2020-08-22 00:36:11

Find all the words in the list that differ by a single letter

Question

4 answers

solution1 2 2020-08-21 23:26:33

solution2 1 2020-08-21 23:14:22

solution3 1 ACCPTED 2020-08-21 23:27:13

solution4 0 2020-08-22 00:36:11

solution1
2 2020-08-21 23:26:33

solution2
1 2020-08-21 23:14:22

solution3
1 ACCPTED 2020-08-21 23:27:13

solution4
0 2020-08-22 00:36:11