简体   繁体   中英

Find all the words in the list that differ by a single letter

For any given word w in a list words , I want to find all the other words in the list that can become w by changing a single letter in them. All words are of equal length, and only substitution is allowed. Call this function parent(w) .

For example, given words = ["hot","dot","dog","lot","log","cog"] , parent("cog") would be ["dog", "log"] . parent("lot") would be ["dot", "hot", "log"] etc.

To do this, I first build a reverse index where the keys (str, int) map to the words that have character str at index int . Then, finding the parents of a word becomes the task of intersecting all the words that have the same letters as the word in the same positions, except for one.

The code is as follows, which produces an empty set. Why is it not working?

from typing import Iterator, Dict, Tuple, Set
import itertools

graph: Dict[Tuple[str, int], Set[int]] = dict()

for i, word in enumerate(words):
    for j, ch in enumerate(word):
        if (ch, j) not in graph:
            graph[(ch, j)] = set()

        graph[(ch, j)].add(i)

def parents(word: str) -> Iterator[int]:
    n: int = len(word)
    s: Set[int] = set()
    for part in itertools.combinations(range(n), n - 1):
        keys = map(lambda x: (word[x], x), part)
        existing_keys = filter(lambda k: k in graph, keys)
        for y in itertools.chain(map(lambda k: graph[k], existing_keys)):
            s = s.intersection(set(y)) if s else set(y)

    return filter(lambda i: words[i] != word, s)

print(list(parents("cog"))) # empty!!!

Your solution is almost there. The problem is that you're intersecting everything you find. But instead you should append your results for each combination. Move s: Set[int] = set() inside your first for loop, and append your results after the second for loop and it'll work. Something like this:

def parents(word: str) -> Set[int]:
    ret: Set[int] = set()
    for part in itertools.combinations(range(n), n - 1):
        keys = map(lambda x: (word[x], x), part)
        existing_keys = filter(lambda k: k in graph, keys)
        s: Set[int] = set()
        for y in map(lambda k: graph[k], existing_keys):
            s = s.intersection(set(y)) if s else set(y)

        ret.update(filter(lambda i: words[i] != word, s))

    return ret

The Levenshtein distance algorithm will achive what you are looking for.

from Levenshtein import distance  # pip install python-Levenshtein

words = ["hot", "dot", "dog", "lot", "log", "cog"]
parent = 'cog'
# find all words matching with one substitution
edits = [w for w in words if distance(parent, w) == 1]
print(edits)

Output:

['dog', 'log']

If you don't want to install any libraries, there are good online resources with Python implementations of the algorithm.

A very simple solution. A different approach.

Complexity: O(N * 26) => O(N) - where N is the number of characters in each word.

def main(words, word):
    words = set(words)
    res = []
    for i, _ in enumerate(word):
        for c in 'abcdefghijklmnopqrstuvwxyz':
            w = word[:i] + c + word[i+1:]
            if w != word and w in words:
                res.append(w)
    return res


print(main(["hot","dot","dog","lot","log","cog"], "cog"))
# ['dog', 'log']

Instead of iterating over all the alphabets, you can also choose to only iterate on the alphabets that are occurring in the list using:

{letter for w in words for letter in w}

I would check every letter of the parent word w against each of the words from the list using the Python in function.

For example for parent("cog") against the list of words:

["hot","dot","dog","lot","log","cog"]

yields:

[1, 1, 2, 1, 2, 3]

Numbers 2 show the correct words: dog and log.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM