簡體   English   中英

查找列表中每個單詞的所有字謎的最有效方法

[英]most efficient way to find all the anagrams of each word in a list

我一直在嘗試創建一個程序,該程序可以找到文本文件中每個單詞的所有字謎(在列表中)(其中包含大約 370k 單詞,由 '\n' 分隔)。

我已經在 python 中編寫了代碼。 我跑了大約一個小時。 只是想知道是否有更有效的方法來做到這一點。

我的代碼

from tqdm.auto import tqdm

ls = open("words.txt","r").readlines()
ls = [i[:-1] for i in ls]
ls = [[i,''.join(sorted(i))] for i in ls]
ln = set([len(i[1]) for i in tqdm(ls)])

df = {}
for l in tqdm(ln):
    df[l] = [i for i in ls if len(i[0]) == l]

full = {}
for m in  tqdm(ls):
    if full.get(m[0]) == None:
        temp = []
        for i in df[len(m[0])]:
            if i[1] == m[1] and i[0] != m[0]:
                temp.append(i[0])
        for i in temp:
            full[i] = temp

如果有更有效的方法可以用其他語言編寫(Rust、C、C++、 :) ...)

使用按字符字母順序排序的單詞作為搜索關鍵字是指向 go 的方向。 也許你已經在你的代碼中使用這一行來做這個(我幾乎從不使用 python):

[[i,''.join(sorted(i))] for i in ls]

無論如何,這是我的 c++ 解決您的問題。 現場演示: https://onlinegdb.com/_gauHBd_3

#include <algorithm>        // for sorting
#include <string>
#include <unordered_map>    // for storing words/anagrams
#include <iostream>
#include <fstream>
#include <set>

// create a class that will hold all words
class dictionary_t
{
public:
    // load a text file with one word per line
    void load(const std::string& filename)
    {
        std::ifstream file{ filename };
        std::string word;

        while (file >> word)
        {
            add_anagram(word);
        }
    }

    auto& find_anagrams(const std::string& word)
    {
        const auto key = get_key(word);

        // intentionally allow an empty entry to be made if word has no anagrams yet
        // for readability easier error handling (not for space/time efficiency)
        auto& anagrams = m_anagrams[key];

        return anagrams;
    }

    // show all anagrams for a word
    void show_anagrams(const std::string& word)
    {
        std::cout << "anagrams for word '" << word << "' are : ";
        auto anagrams = find_anagrams(word);

        for (const auto& anagram : anagrams)
        {
            if (anagram != word)
            {
                std::cout << anagram << " ";
            }
        }

        std::cout << "\n";
    }

private:
    // this function is key to the whole idea
    // two words are anagrams if they sort their letters
    // to the same order. e.g. beast and betas both sort (alphabetically) to abest 
    std::string get_key(const std::string& word)
    {
        std::string key{ word };
        // all anagrams sort to the same order of characters.
        std::sort(key.begin(), key.end()); 
        return key;
    }

    void add_anagram(const std::string& word)
    {
        // find the vector of anagrams for this word
        auto& anagrams = find_anagrams(word);

        // then add word to it (I use a set so all words will be unique even
        // if input file contains duplicates)
        anagrams.insert(word);
    }

    std::unordered_map<std::string, std::set<std::string>> m_anagrams;
};


int main()
{
    dictionary_t dictionary;
    dictionary.load("words.txt");

    dictionary.show_anagrams("beast");
    dictionary.show_anagrams("tacos");

    return 0;
}

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM