简体   繁体   English

C ++单词解扰器

[英]C++ word unscrambler

I'm fairly new to C++, and as an exercise, I'm trying to write a "word unscrambler". 我对C ++还是相当陌生,作为一种练习,我正在尝试编写一个“单词解读器”。 That is, I have a large text file full of words that is loaded into a trie. 也就是说,我有一个装满单词的大文本文件,该文件被装入了特里。 Each trie_node has an array of 27 trie_nodes which are by default NULL unless that element shares the same position as a letter in the alphabet that can follow the letter the trie_node represents. 每个trie_node都有27个trie_nodes的数组,默认情况下为NULL,除非该元素与字母表中与trie_node表示的字母相同的字母共享相同的位置。 The 27 element indicates that the word can end at that node. 27元素表示单词可以在该节点结束。

I have this class that I want to permute through all letter combinations, but doesn't bother going through letter combinations that are impossible. 我有一个想在所有字母组合中进行替换的类,但不要打扰不可能的字母组合。

What I've written almost has what I need. 我写的几乎满足了我的需求。 However, it only works with very specific combinations of letters. 但是,它仅适用于非常特定的字母组合。

For instance, if you input the letters "last" you get the following words: 例如,如果输入字母“ last”,则会得到以下单词:

last
salt
slat

However, if you input the word "salt" (a permutation of "last"), you only get this: 但是,如果输入单词“ salt”(“ last”的排列),则只会得到以下内容:

salt

I'm pretty sure the problem is in my permute() method. 我很确定问题出在我的permute()方法中。 What's the most efficient way to find these words without iterating through all permutations and comparing that to the word list (which would be an expensive n! operation)? 在不迭代所有排列并将其与单词列表进行比较的情况下查找这些单词的最有效方法是什么(这将是昂贵的n!操作)?

#pragma once

#include <map>
#include <string>
#include <fstream>

#include "trie.h"

using std::ifstream;
using std::string;
using std::map;

class Words
{
private:
    trie_node* WordEnd; // end marker
    trie_node wordbank;
    map<string, short> _found;

    template <typename E>
    static void swap(E* i, E* j) {
        E k = *i;
        *i = *j;
        *j = k;
    }

    void permute(char* word, const trie_node* node, size_t pos, size_t len) {
        if (is_word(word, len)) {
            string str_word(word, len);
            _found[str_word] = 0;
        }
        if (pos < len - 1) {
            size_t pos2;
            for (pos2 = pos; pos2 < len; ++pos2) {
                char* j = word + pos2;
                const trie_node* _next = next(node, *j);
                if (_next) { // check if that's a valid path
                    char* i = word + pos;
                    swap(i, j); // swap letters
                    permute(word, _next, pos, len); // find that route
                    swap(i, j); // switch back
                }
            }
        }
    }

public:
    Words()
        : wordbank(27) {
        WordEnd = new trie_node(1);
    }

    Words(const Words& other)
        : wordbank(27) {
        operator=(other);
    }

    ~Words() {
        delete WordEnd;
    }

    Words& operator=(const Words& other) {
        if (this != &other) {
            WordEnd = new trie_node(*WordEnd);
            wordbank = other.wordbank;
            _found = other._found;
        }
        return *this;
    }

    void clear() {
        _found.clear();
    }

    void permute(char* word, size_t len) {
        permute(word, &wordbank, 0, len);
    }

    size_t size() const {
        return _found.size();
    }

    size_t found(string buff[], size_t len) const {
        if (len > _found.size()) {
            len = _found.size();
        }
        size_t index = 0;
        for (map<string, short>::const_iterator it = _found.begin(), e = _found.end(); it != e; ++it) {
            buff[index] = it->first;
            if (++index == len) {
                break;
            }
        }
        return len;
    }

    const trie_node* next(char c) const {
        return next(&wordbank, c);
    }

    static const trie_node* next(const trie_node* n, char c) {
        if (isalpha(c)) {
            size_t pos = tolower(c) - 'a';
            return n->operator[](pos);
        }
        return NULL;
    }

    bool is_word(const char* word, size_t len) const {
        const trie_node* node = &wordbank;
        for (size_t i = 0; i < len; ++i) {
            if (isalpha(word[i])) {
                size_t index = tolower(word[i]) - 'a';
                const trie_node* next = node->operator[](index);
                if (!next) {
                    return false;
                }
                node = next;
            }
        }
        return node->operator[](26) == WordEnd;
    }

    bool load(const string& path) {
        ifstream wordfile;
        wordfile.open(path);
        if (!wordfile.is_open()) {
            return false;
        }
        trie_node* node = &wordbank;
        string word;
        while (getline(wordfile, word)) {
            size_t i = 0;
            for (; i < word.size(); ++i) {
                size_t index = word[i] - 'a';
                trie_node* _next = (*node)[index];
                if (!_next) {
                    _next = node->branch(index);
                }
                node = _next;
                if (i == word.size() - 1) {
                    _next->set(26, WordEnd);
                }
            }
        }
        wordfile.close();
        return true;
     }
};

IIUC you are trying to find all anagrams of a word in a dictionary. IIUC您正在尝试在字典中查找单词的所有字谜。 The best way to do this is as follows: 最好的方法如下:

1. Create map from string to list of strings.
2. For each word in dictionary.
  a. Let sortedWord = sort letters in word lexicographically.
  b. Add word to the list in the map whose key is sortedWord
3. Let searchWord be the word whose anagrams you are looking for.
4. Let sortedSearchWord = sort letters in searchWord lexicographically.
5. Return map[sortedSearchWord]

Assuming the longest word in the dictionary has k letters and there are n words, this algorithm runs in O(n*k*log(k)) to build the map and then it runs in O(k*log(k)) to find anagrams of a given words. 假设字典中最长的单词有k个字母并且有n个单词,则此算法在O(n*k*log(k))以构建地图,然后在O(k*log(k))以查找给定单词的字谜。

Thanks for your suggestions. 感谢您的建议。 I streamlined the whole thing with this: 我通过以下方法简化了整个过程:

#include <iostream>
#include <string>
#include <algorithm>
#include <fstream>

using namespace std;

inline void sort(string& str) {
    sort(str.begin(), str.end());
}

void findwords(string& letters, istream& in, ostream& out) {
    sort(letters);
    string word;
    while (getline(in, word)) {
        string token(word);
        sort(token);
        if (token == letters) {
            out << word << endl;
        }
    }
}

int main(int argc, char* argv[]) {
    if (argc != 2) {
        cout << "usage: scramble <word>" << endl;
        return 1;
    }
    ifstream wordsfile;
    wordsfile.open("words.txt");
    if (!wordsfile.is_open()) {
        cout << "unable to load words.txt" << endl;
        return 2;
    }
    string words(argv[1]);
    findwords(words, wordsfile, cout);
    wordsfile.close();
    return 0;
}

This pretty much solves everything. 这几乎解决了所有问题。 However, I may want to add functionality to find all possible words in a string, not just anagrams, but that's another project. 但是,我可能想添加功能以查找字符串中所有可能的单词,而不仅仅是字谜,但这是另一个项目。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM