简体   繁体   English

两个Trie节点之间的最短路径

[英]Shortest Path between two Trie Nodes

This is a twofold question, because I'm out of ideas on how to implement this most efficiently. 这是一个双重性的问题,因为我对如何最有效地实现这一点没有想法。

I have a dictionary of 150,000 words, stored into a Trie implementation, here's what my particular implementation looks like: 我有一个150,000个单词的字典,存储在Trie实现中,这是我的特定实现的样子: 三角图

A user is given a provided with two words. 给用户提供两个单词。 With the goal being to find the shortest path of other english words (changed by one character apiece) from the start word to the end word. 目的是找到其他英语单词(从一个单词到最后一个单词)的最短路径(每个单词一个字符更改)。

For example: 例如:

Start: Dog 开始于:狗

End: Cat 结束:猫

Path: Dog, Dot, Cot, Cat 路径:狗,圆点,婴儿床,猫

Path: Dog, Cog, Log, Bog, Bot, Cot, Cat 路径:狗,齿轮,原木,沼泽,机器人,婴儿床,猫

Path: Dog, Doe, Joe, Joy, Jot, Cot, Cat 路径:狗,母鹿,乔,乔伊,乔特,婴儿床,猫


My current implementation has gone through several iterations, but the simplest I can provide pseudocode for (as the actual code is several files): 我当前的实现经历了几次迭代,但是我可以为其提供最简单的伪代码(因为实际代码是几个文件):

var start = "dog";
var end = "cat";
var alphabet = [a, b, c, d, e .... y, z];
var possible_words = [];

for (var letter_of_word = 0; letter_of_word < start.length; letter_of_word++) {
  for (var letter_of_alphabet = 0; letter_of_alphabet < alphabet.length; letter_of_alphabet++) {
      var new_word = start;
      new_word.characterAt(letter_of_word) = alphabet[letter_of_alphabet];
      if (in_dictionary(new_word)) {
          add_to.possible_words;
      }
  }  
}

function bfs() {
    var q = [];
    ... usual bfs implementation here ..
}

Knowns: 已知:

  • A start word and a finish word 起始词和结束词
  • Words are of the same length 单词长度相同
  • Words are english words 单词是英语单词
  • It is possible for there to not be a path 可能没有道路


Question: 题:

My issue is I do not have an efficient way of determining a potential word to try without brute-forcing the alphabet and checking each new word against the dictionary. 我的问题是,我没有一种有效的方法来确定可能要尝试的单词,而不会强行使用字母并对照字典检查每个新单词。 I know there is a possibility of a more efficient way using prefixes, but I can't figure out a proper implementation, or one that doesn't just double the processing. 我知道有一种使用前缀的更有效方法的可能性,但是我无法弄清楚适当的实现,或者不能仅仅将处理量加倍。

Secondly, should I be using a different search algorithm, I've looked at A* and Best First Search as possibilities, but those require weights, which I don't have. 其次,如果我使用其他搜索算法,我将A *和“最佳优先搜索”视为可能,但这些都需要权重,而我没有。

Thoughts? 有什么想法吗?

As requested in comments, illustrating what I mean by encoding linked words in the bits of integers. 正如评论中所要求的那样,说明了我的意思是将链接的单词编码为整数位。

In C++, it might look something like... 在C ++中,它可能看起来像...

// populate a list of known words (or read from file etc)...
std::vector<std::string> words = {
    "dog", "dot", "cot", "cat", "log", "bog"
};

// create sets of one-letter-apart words...
std::unordered_map<std::string, int32_t> links;
for (auto& word : words)
    for (int i = 0; i < word.size(); ++i)
    {
        char save = word[i];
        word[i] = '_';
        links[word] |= 1 << (save - 'a');
        word[i] = save;
    }

After the above code runs, links[x] - where x is a word with one letter replaced with an underscore a la d_g - retrieves an integer indicating the letters that can replace the underscore to form known words. 上面的代码运行之后, links[x] -其中x是一个单词,其中一个字母替换为下划线la d_g检索一个整数,该整数指示可以替换下划线以形成已知单词的字母。 If the least significant bit is on, then 'dag' is a known word, if the next-from-least-significant bit is on, then 'dbg' is known word etc.. 如果最低有效位打开,则“ dag”是已知单词,如果最低有效位的下一个打开,则“ dbg”是已知单词,等等。

Intuitively I'd expect using integers to reduce the overall memory used for linkage data, but if the majority of words only have a couple linked words each, storing some index or pointer to those words may actually use less memory - and be easier if you're unused to bitwise manipulations, ie: 直观上,我希望使用整数来减少用于链接数据的整体内存,但是如果大多数单词每个单词只有几个链接的单词,那么存储指向这些单词的索引或指针实际上可能会使用更少的内存-如果您这样做,会更容易不使用按位操作,即:

std::unordered_map<std::string, std::vector<const char*>> links;
for (auto& word : words)
    for (int i = 0; i < word.size(); ++i)
    {
        char save = word[i];
        word[i] = '_';
        links[word].push_back(word.c_str());
        word[i] = save;
    }

Either way, you then have a graph linking each word to those it can transform into with single-character changes. 无论哪种方式,您都可以得到一个图形,将每个单词与可以通过单字符更改转换成的单词链接起来。 You can then apply the logic of Dijkstra's algorithm to find a shortest path between any two words. 然后,您可以应用Dijkstra算法的逻辑来查找任意两个单词之间的最短路径。

Just to add an update for those that starred this question, I've added a Github repository for an implementation in Javascript for this particular Data-structure. 只是为了为引起这一问题的人员提供更新,我为该特定数据结构的Javascript添加了一个Github存储库。

https://github.com/acupajoe/Lexibit.js https://github.com/acupajoe/Lexibit.js

Thank you all for the help and ideas! 谢谢大家的帮助和想法!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在对象数组中找到两个节点之间的最短路径? - How to find the shortest path between two nodes within array of objects? 两个地址之间的最短路径 - The shortest path between two addresses 如何获得网格上节点之间的最短路径? - How to get shortest path between nodes on a grid? 计算两个地理点之间的最短路径? - Calculate Shortest Path between two geo points? 如何获得比节点之间的最短路径更多的路径? - How to get more paths than the shortest path between nodes? 是否有任何预先实现的路由算法可用于使用航路点作为节点来查找 2 个机场之间的最短路径 - Is there any pre implemented routing algorithm that I can use to find the shortest path between 2 airports using waypoints as nodes JavaScript-通过图中数百个节点的最短路径 - JavaScript - shortest path through hundreds of nodes in a graph 如何计算节点之间的最短路径? - How to calculate shortest possible route between nodes? 如何在CYTOSCAPE JS中突出显示两个节点之间的路径 - How to highlight the path between two nodes in CYTOSCAPE JS 在自定义数据结构上查找JavaScript中两个节点之间的路径 - Find the Path between two nodes in JavaScript on custom data structure
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM